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1  SUMMARY 


os  kernels  form  the  baekbone  of  all  system  software.  They  ean  have  the  greatest  impaet  on 
the  resilienee,  extensibility,  and  seeurity  of  today’s  eomputing  hosts.  Reeent  effort  on  seL4  has 
demonstrated  the  feasibility  of  building  large  seale  formal  proofs  of  funetional  eorreetness  for  a 
general-purpose  mierokernel,  but  the  eost  of  sueh  verifieation  is  still  prohibitive,  and  it  is  unelear 
how  to  use  sueh  a  verihed  kernel  to  reason  about  user- level  programs  and  other  kernel  extensions. 

Under  this  DARPA  CRASH  effort  (FA8750- 10-2-0254),  the  PI  (Prineipal  Investigator)  and  his 
team  has  developed  a  elean-slate  CertiKOS  hypervisor  kernel  that  runs  on  Intel  and  AMD  multieore 
platforms  with  hardware  virtualization  and  ean  boot  Linux  and  ROS  applieations  in  its  multiple 
virtual  maehines.  A  version  of  CertiKOS  is  now  deployed  on  all  the  ground  vehiele  platforms 
(LandShark  UGV  and  Ameriean  Built  Car)  in  the  DARPA  HACMS  program. 

The  PI  and  his  team  have  also  developed  a  new  set  of  eertihed  programming  methodologies  and 
tools  that  support  programming  and  eomposing  eertihed  abstraetion  layers  (in  C  or  assembly)  and 
ean  verify  eontextual  safety,  eorreetness,  liveness,  and  seeurity  properties  in  one  unihed  setting. 

Using  these  new  languages  and  tools,  they  developed  a  new  eompositional  arehiteeture  for 
building  eertihed  OS  kernels.  Beeause  the  very  purpose  of  an  OS  kernel  is  to  build  layers  of 
abstraetion  over  hardware  resourees,  they  insisted  on  uneovering  and  speeifying  these  layers 
formally,  and  then  verifying  eaeh  kernel  module  at  its  proper  abstraetion  level.  To  support  reasoning 
about  user-level  programs  and  linking  with  other  eertihed  kernel  extensions,  they  proved  a  strong 
eontextual  rehnement  property  for  every  kernel  funetion,  whieh  states  that  the  implementation  of 
eaeh  sueh  funetion  will  behave  like  its  speeiheation  under  any  kernel/user  (or  host/guest)  eontext.  To 
demonstrate  the  effeetiveness  of  this  new  approaeh,  they  have  sueeessfully  speeihed  a  uniproeessor 
variant  of  their  full  CertiKOS  kernel  and  verihed  its  (eontextual)  funetional  eorreetness  property 
in  the  Coq  proof  assistant.  They  showed  how  to  extend  their  base  kernel  with  new  features  sueh 
as  virtualization  and  ring-0  proeesses  and  how  to  quiekly  adapt  existing  verihed  layers  to  build 
new  eertihed  kernels  for  different  domains.  Their  eertihed  hypervisor  OS  kernel  is  written  in  5500 
lines  of  C  and  x86  assembly,  and  ean  sueeessfully  boot  a  version  of  Linux  as  a  guest.  The  entire 
speeiheation  and  proof  effort  took  less  than  1.5  person  years. 

They  have  also  developed  new  semanties  and  logies  for  supporting  Deelarative  Deeentralized 
Information  Flow  Control  (DIFC)  with  deelassiheation.  They  proposed  a  new  framework  whieh 
advoeate  the  use  of  an  instrumented  semanties  for  reasoning  and  the  erasure  semanties  for  exeeution. 
Their  new  program  logie  ean  be  used  tor  verify  seeurity  properties  for  low-level  C  or  assembly 
programs.  They  showed  that  they  ean  prove  a  new  form  of  non-interferenee  properties  even  in  the 
presenee  of  deelassiheation.  This  teehnology  is  now  being  ported  into  their  CertiKOS  kernels. 

They  have  also  developed  new  ground-breaking  eertihed  resouree  analysis  tools  and  new  logies 
for  verifying  safety  and  liveness  of  hne-grained  shared  memory  eoneurrent  programs. 
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On  the  formal  methods  side,  they  have  also  made  the  first  eomprehensive  study  that  aims 
to  address  the  arehiteeture  defieieneies  in  all  of  today’s  proof  assistants.  They  proposed  a  new 
proof-assistant  arehiteeture  that  uses  extensible  eonversion  rules  and  statie  proof  expressions  to 
support  effeetive  and  prineipled  proof  development.  They  developed  the  design,  the  eomplete  meta 
theory,  and  a  full  eompiler  of  a  novel  programming  language  ealled  VeriML  whieh  realizes  the  new 
arehiteeture  and  also  offers  a  unified  platform  for  eoding  all  kinds  of  eomputation  on  logieal  terms. 
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2  INTRODUCTION 


Operating  System  (OS)  kernels  and  hypervisors  form  the  backbone  of  every  safety-critical  software 
system  in  the  world.  Hence  it  is  highly  desirable  to  formally  verify  the  correctness  of  these 
programs  [35].  Recent  work  on  seL4  [19, 20]  has  shown  that  it  is  feasible  to  formally  prove  the 
functional  correctness  property  of  a  general-purpose  microkernel,  but  the  cost  of  such  verification  is 
still  quite  prohibitive.  It  took  the  seL4  team  more  than  1 1  person  years  (effort  for  tool  development 
excluded)  to  verify  7500  lines  of  sequential  C  code,  yet  the  resulting  kernel  still  contains  1200  lines 
of  additional  C  code  and  600  lines  of  assembly  code  that  are  not  verified.  Worse  still,  even  after  all 
these  efforts,  the  current  verified  seL4  kernel  cannot  be  used  to  reason  about  user-level  programs  as 
it  does  not  verify  important  features  such  as  virtual-memory  page  faults  and  address  translation. 

What  makes  the  verification  of  OS  kernels  so  challenging? 

First,  OS  kernels  are  complex  artifacts;  they  contain  many  interdependent  components  that  are 
difficult  to  untangle.  Their  invariants  can  involve  machine  level  details  (e.g.,  how  the  virtual  memory 
hardware  works)  but  can  also  cut  across  multiple  abstraction  boundaries  (e.g.,  different  views  of  an 
address  space  under  kernel/user  or  host/guest  modes).  Several  researchers  [1,41]  observed  that  even 
writing  down  a  good  and  easy-to-maintain  formal  specification  alone  is  already  a  major  roadblock 
for  any  such  verification  effort. 

Second,  OS  kernels  are  often  written  in  C,  which  only  supports  limited  forms  of  abstraction. 
Verification  of  C  programs  is  especially  hard  if  they  manipulate  low-level  data  structures  (e.g., 
thread  queues,  allocation  tables).  The  seL4  effort  used  an  intermediate  executable  specification 
(derived  from  a  Haskell  prototype)  to  hide  some  messy  C  specifics,  but  this  alone  is  not  enough  for 
enforcing  abstraction  among  different  kernel  components;  seL4  had  to  introduce  capabilities  which 
add  significant  implementation  complexities  to  the  kernel. 

Third,  OS  kernels  are  developed  for  managing  and  multiplexing  hardware,  so  it  is  important  to 
have  a  machine  model  that  can  describe  hardware  details.  The  C  language  is  too  high  level  for  this 
purpose.  For  example,  while  most  kernel  code  can  be  written  in  C,  many  key  kernel  concepts  (e.g., 
context  switches,  address  translation,  page  fault  handling)  can  only  be  given  accurate  semantics 
at  the  assembly  level.  Consequently,  we  need  a  formal  assembly  model  to  define  many  kernel 
behaviors,  but  we  also  want  to  verify  most  kernel  code  at  a  much  higher  abstraction  level. 

Fourth,  OS  kernel  verification  would  not  scale  if  it  does  not  support  extensibility.  One  advantage 
of  a  verified  kernel  is  the  existence  of  formal  specifications  for  all  of  its  components.  In  theory, 
this  would  allow  us  to  add  certified  kernel  plug-ins  [36]  as  long  as  they  do  not  violate  any  existing 
kernel  invariants.  In  practice,  however,  if  we  are  unable  to  decompose  kernel  invariants  into  small 
independent  pieces,  even  modifying  an  existing  (or  adding  a  new)  verified  component  may  force  us 
to  rewrite  the  proofs  for  the  entire  kernel. 

Under  this  DARPA  CRASH  (Clean-Slate  Design  of  Resilient,  Adaptive,  Secure  Hosts)  effort 


3 


(FA8750- 10-2-0254),  the  PI  and  his  team  at  Yale  University  have  developed  a  novel  compositional 
approach  that  successfully  tackles  all  of  the  above  challenges  in  building  certified  OS  kernels.  They 
believe  that,  to  make  verification  scale  and  to  provide  strong  support  to  extensibility,  they  must  first 
have  a  compositional  specification  that  can  untangle  all  the  kernel  interdependencies.  Because  the 
very  purpose  of  an  OS  kernel  is  to  build  layers  of  abstraction  over  bare  machines,  they  insist  on 
meticulously  uncovering  and  specifying  these  layers  (done  in  the  Coq  proof  assistant  [40]),  and 
then  verifying  each  kernel  module  at  its  proper  abstraction  level. 

The  functional  correctness  of  an  OS  kernel  (as  done  in  seL4)  is  usually  stated  as  a  refinement 
property.  Roughly  speaking,  if  Me  stands  for  the  C/assembly  implementation  of  a  kernel,  Ma  for 
its  abstract  functional  specification,  and  [[•]]  for  each’s  corresponding  state  machine,  then  Mq  refines 
Ma  if  there  exists  dL  forward  simulation  [28]  from  jMc-J  to  [Myi]  (denoted  as  c  |[M^]]). 

Through  such  refinement,  Gerwin  et  al  [19, 30, 33, 34]  claimed  that  many  properties  established  for 
Ma  (e-g-,  confidentiality  [30]  when  Ma  is  deterministic)  can  be  transferred  to  Me- 

This  claim,  unfortunately,  fails  to  hold  in  the  context  of  any  interesting  user-level  programs. 
If  P  stands  for  a  collection  of  user-level  processes  and  m  for  a  linking  operator,  then  from 
alone,  we  cannot  derive  m  i’ll  E  IMa  M  Pj .  This  is  because  the  semantics 
of  running  P  on  top  of  Ma  (where  virtual  memory  hardware  is  hidden)  is  different  from  that  of 
running  P  on  top  of  Me  (where  page  faults  and  address  translation  do  come  into  play).  Daum  et 
al  [7]  partially  closed  the  gap  by  extending  the  original  refinement  proof  to  also  track  memory 
permissions,  but  they  still  did  not  deal  with  page  faults  in  their  model  of  user  transitions. 

Under  this  new  DARPA  CRASH  effort,  the  PI  and  his  team  instead  prove  the  strong  contextual 
refinement  property  for  all  kernel  modules  directly,  they  show  that  for  any  kemel/user  or  host/guest 
context  code  P,  jMc  m  Pj  c  |[M^  m  P]  always  holds.  This  guarantees  that  they  cannot  overlook 
any  subtle  difference  between  machines  at  different  abstraction  levels. 

More  specifically,  they  developed  a  new  extensible  architecture  (called  CertiKOS)  for  building 
certified  OS  kernels.  CertiKOS  uses  contextual  refinement  as  the  unifying  formalism  for  composing 
kernel  and  user  components  at  different  abstraction  levels.  Each  abstraction  layer  is  defined  as  an 
assembly-level  machine  extended  with  a  particular  set  of  abstract  states  and  primitives.  However, 
most  of  their  kernel  programs  are  written  in  a  variant  of  C  (called  ClightX)  [14],  verified  at  the 
source  level,  and  compiled  and  linked  together  using  a  modified  version  [14]  of  the  CompCert 
verified  compiler  [21,22].  CertiKOS  is  the  first  architecture  that  can  truly  transfer  global  properties 
proved  for  user-level  programs  (at  the  kernel  specification  level)  down  to  the  concrete  assembly 
machine  level. 

Using  CertiKOS,  they  have  developed  a  fully  certified  mCertiKOS  kernel  in  Coq.  Unlike 
seL4,  they  decompose  the  specification  of  mCertiKOS  into  33  logical  abstraction  layers,  and 
turn  an  otherwise  prohibitive  verification  task  into  many  simple  and  easily  automatable  sub-tasks. 
The  resulting  kernel  is  a  certified  assembly  implementation  that  still  enjoys  a  high  degree  of 
compositionality.  Their  layered  specification  shows  that  interdependent  low-level  kernel  modules 
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can  indeed  be  untangled  and  given  clear  formal  semantics. 

Using  mCertiKOS  as  the  base,  we  have  also  built  three  additional  certified  kernels:  mCertiKOS- 
hyp  extends  mCertiKOS  with  virtualization  support  to  form  a  hypervisor  kernel;  mCertiKOS-rz 
extends  mCertiKOS-hyp  with  “ring  0”  processes  (they  are  “certifiably  safe”  application  programs 
that  can  run  safely  inside  the  kernel  address  space,  similar  to  SIPs  in  Singularity  [17]);  mCertiKOS- 
emb  removes  virtual  memory  and  virtualization  support  from  mCertiKOS-rz  so  that  it  only  supports 
“ring  0”  processes. 

They  have  done  a  detailed  evaluation  of  their  certified  development  effort,  including  kernel 
performance,  the  cost  of  layer  design  and  proof  development,  and  the  cost  of  building  new  extended 
(or  adapted)  kernels.  All  of  their  certified  kernels  are  practical  and  can  run  on  stock  x86  hardware. 
Their  certified  hypervisor  kernel  (mCertiKOS-hyp)  consists  of  5500  lines  of  C  and  x86  assembly, 
and  can  successfully  boot  a  version  of  Linux  as  a  guest.  The  entire  specification  and  proof  effort 
took  less  than  1.5  person  years. 

Finally,  in  addition  to  developing  new  cutting-edge  technologies  for  building  certified  OS 
kernels,  the  PI  and  his  team  have  also  made  significant  breakthroughs  on  the  following  problems: 

•  They  have  developed  a  clean-slate  hypervisor  kernel  that  runs  on  Intel  and  AMD  multicore 
platforms  with  hardware  virtualization  and  can  boot  Linux  and  ROS  (Robot  Operating  System) 
applications  in  its  multiple  virtual  machines.  This  hypervisor  kernel  is  now  deployed  on 
all  the  ground  vehicle  platforms  (LandShark  UGV  and  American  Built  Car)  in  the  DARPA 
HACMS  (High- Assurance  Cyber  Military  Systems)  program. 

•  They  have  developed  a  new  set  of  certified  programming  methodologies  and  tools  [14]  that 
support  programming  and  composing  certified  abstraction  layers  (in  C  or  assembly)  and  can 
verify  contextual  safety,  correctness,  liveness,  and  security  properties  in  one  unified  setting. 

•  They  have  developed  new  semantics  and  logics  [6]  for  supporting  Declarative  Decentralized 
Information  Flow  Control  (DIFC)  with  declassification.  They  proposed  a  new  framework 
which  advocate  the  use  of  an  instrumented  semantics  for  reasoning  and  the  erasure  semantics 
for  execution.  Their  new  program  logic  can  be  used  tor  verify  security  properties  for  low-level 
C  or  assembly  programs.  They  showed  that  they  can  prove  a  new  form  of  non-interference 
properties  even  in  the  presence  of  declassification.  This  technology  is  now  being  ported  into 
their  CertiKOS  kernels. 

•  They  have  developed  new  ground-breaking  certified  resource  analysis  tools  [3,4]  and  new 
logics  [25,26]  for  verifying  safety  and  liveness  of  fine-grained  shared  memory  concurrent 
programs. 

•  They  have  made  the  first  comprehensive  study  [39]  that  aims  to  address  the  architecture 
deficiencies  in  all  of  today’s  proof  assistants.  They  proposed  a  new  proof-assistant  architecture 
that  uses  extensible  conversion  rules  and  static  proof  expressions  to  support  effective  and 
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principled  proof  development.  They  developed  the  design,  the  eomplete  meta  theory,  and  a 
full  eompiler  of  a  novel  programming  language  ealled  VeriML  [37-39]  whieh  realizes  the 
new  arehiteeture  and  also  offers  a  unified  platform  for  eoding  all  kinds  of  eomputation  on 
logieal  terms. 
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3  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 

The  ultimate  goal  of  research  on  certified  OS  kernels  is  not  just  to  verify  the  functional  correctness 
of  a  particular  kernel,  but  rather  to  find  the  best  OS  design  and  development  methodologies  that 
can  be  used  to  build  provably  reliable,  secure,  and  efficient  computer  systems  in  a  cost-effective 
way.  The  PI  and  his  team  at  Yale  enumerated  the  following  important  dimensions  of  concerns  and 
evaluation  metrics  which  they  have  used  so  far  to  guide  their  work  toward  realizing  this  goal: 

•  Support  for  new  kernel  design.  Traditional  OS  kernels  use  the  hardware-enforced  “red  line” 
to  define  a  single  system  call  API  (Application  Programming  Interface).  A  certified  OS  kernel 
opens  up  the  design  space  significantly  as  it  can  support  multiple  certified  kernel  APIs  at 
different  abstraction  levels.  It  is  important  to  support  kernel  extensions  [2, 10]  and  ring-0 
processes  [17]  so  we  can  experiment  and  find  the  best  trade-offs. 

•  Kernel  performance.  Verification  should  not  impose  significant  overhead  on  kernel  perfor¬ 
mance.  Of  course,  different  kernel  designs  may  imply  different  performance  priorities.  An 
L4-like  microkernel  [27]  would  sacrifice  portability  for  faster  inter-process  communication 
(IPC)  while  a  Singularity-like  kernel  [17]  would  focus  on  efficient  support  for  type-safe  ring-0 
processes. 

•  Verification  of  global  properties.  A  certified  kernel  is  much  less  interesting  if  it  cannot  be 
used  to  prove  global  properties  of  the  complete  system  built  on  top  of  the  kernel.  Such  global 
properties  include  not  only  safety,  liveness,  and  security  properties  of  user-level  processes 
and  virtual  machines,  but  also  resource  usage  and  availability  properties  (e.g.,  to  counter 
denial-of-service  attacks). 

•  Quality  of  kernel  specification.  A  good  kernel  specification  should  capture  precisely  those 
contextually  observable  behaviors  in  the  kernel  implementation  [14].  It  must  support  trans¬ 
ferring  global  properties  proved  at  a  high  abstraction  level  down  to  any  lower  abstraction 
level. 

•  Cost  of  development  and  maintenance.  Compositionality  is  the  key  to  minimize  such  cost. 
If  the  machine  model  is  stable,  verification  of  each  kernel  module  should  only  need  to  be  done 
once  (to  show  that  it  implements  its  deep  functional  specification  [14]).  Global  properties 
should  be  derived  from  the  kernel  specification  alone. 

•  Quality  of  formal  proofs.  They  use  the  term  certified  kernels  rather  than  verified  kernels  to 
emphasize  the  importance  of  third-party  machine-checkable  proof  certificates  [35].  Hand¬ 
written  paper  proofs  are  error-prone  [18].  Program  verification  without  machine-checkable 
proofs  has  been  subject  to  significant  controversy  [8]. 
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Figure  1:  Certified  OS  kernels:  what  to  prove? 


3.1  Overview  of  the  CertiKOS  Approach 

Their  new  CertiKOS  arehiteeture  aims  to  address  all  of  the  above  eoneems  and  also  taekle  all 
four  ehallenges  deseribed  in  See.  2.  The  CertiKOS  arehiteeture  leverages  the  new  languages  and 
tools  [14]  whieh  the  PI  and  his  team  have  developed  reeently  for  building  eertified  abstraetion  layers 
with  deep  speeifieations. 

A  certified  layer  is  a  new  language-based  module  eonstruet  that  eonsists  of  a  triple  (Li,  M,  L2) 
plus  a  meehanized  proof  objeet  showing  that  the  layer  implementation  M,  built  on  top  of  the 
interfaee  Li  (the  underlay),  is  a  contextual  refinement  of  the  desirable  interfaee  L2  above  (the 
overlay).  A  deep  speeifieation  (e.g.,  L2)  of  a  module  (e.g.,  M)  eaptures  everything  contextually 
observable  about  running  the  module  over  its  underlay  (e.g.,  Li).  Onee  they  have  built  a  eertified 
layer  M  with  a  deep  speeifieation  L2,  there  is  no  need  to  ever  look  at  M  again,  and  any  property 
about  M  ean  be  proved  using  L2  alone.  Of  eourse,  if  the  semanties  of  the  underlying  abstraet 
maehine  (for  M)  ehanges,  the  deep  speeifieation  for  M  may  also  have  to  ehange. 

Under  CertiKOS,  building  a  new  eertified  kernel  (or  experimenting  a  new  design)  is  just  a 
matter  of  eomposing  a  eolleetion  of  eertified  layers,  developed  in  a  variant  of  C  (ealled  ClightX)  or 
assembly.  The  PI  and  his  team  [14]  have  developed  a  powerful  Coq  library  for  supporting  horizontal 
and  vertical  eomposition  of  eertified  layers.  They  have  also  built  a  eertified  eompiler  (ealled 
CompCertX)  that  ean  eompile  eertified  ClightX  layers  into  eertified  assembly  layers.  CertiKOS 
ean  thus  enjoy  the  full  programming  power  of  an  ANSI  C  variant  and  also  the  assembly  language 
to  eertify  any  effieient  routines  required  by  low-level  kernel  programming.  The  layer  meehanism 
allows  us  to  eertify  most  kernel  eomponents  at  higher  abstraetion  levels,  even  though  they  all 
eventually  get  mapped  (or  eompiled)  down  to  an  assembly  maehine. 

In  Fig.  I,  they  use  x86  to  denote  an  assembly  maehine  and  [[  Ixse  for  its  whole-maehine 
semanties.  Suppose  they  load  sueh  a  maehine  with  the  mCertiKOS  kernel  K  (in  assembly)  and 
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Figure  2:  Overview  of  the  CertiKOS  architecture 


user-level  assembly  code  P;  then  proving  any  global  property  of  such  a  complete  system  amounts 
to  reasoning  about  the  semantic  object  m  Pjxse- 

Reasoning  at  such  a  low  level  is  difficult,  so  they  formalize  a  new  mCertiKOS  machine  that 
extends  the  x86  machine  with  the  deep  specification  of  K.  They  use  J  JmCertiKos  to  denote  its 
whole-machine  semantics.  The  contextual  refinement  property  about  the  mCertiKOS  kernel  can  be 
stated  as  VP,  JP  m  Pj^se  ^  [PlmCertiKos-  Hence  any  global  property  proved  about  jPlmCertiKos 
can  be  transferred  to  |[iC  m  Pjxse- 

In  CertiKOS,  they  also  use  contextual  refinement  to  support  fine-grained  layer  decomposition 
and  linking.  In  Fig.  2,  to  build  a  certified  kernel  K,  they  decompose  it  into  multiple  kernel  modules 
Pi, ...,  Kn,  each  sitting  at  its  respective  underlay  (Lq,  ...,  Ln-i).  Each  such  module  (Ki)  implements 
the  primitives  in  its  overlay  (i.e.,  Li)  but  it  can  only  call  the  primitives  in  its  underlay  (Tj_i).  Using 
vertical  composition  [14],  from  the  contextual  refinement  VP,  [[Pj  m  P]i-i  ^  QdiCh.  layer 

(they  use  [[-Jj  to  denote  the  semantics  of  the  Lj  machine),  they  can  deduce  VP,  |[P  m  PJq  =  |[Pi  xi 
P2  •  •  •  X  P„  M  P]]o  c  [[Pg  •  •  •  xi  P„  M  P]]i  •  •  •  c  |[p^  X  P]„_i  c  |[P]]„.  If  they  instantiate  Lq  and 
Ln  with  the  x86  and  mCertiKOS  layers,  they  get  precisely  the  contextual  refinement  property  of 
the  mCertiKOS  kernel.  They  can  also  compose  intermediate  layers  in  the  same  way — this  makes  it 
much  easier  to  modify  existing  (or  add  new)  certified  kernel  modules. 
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What  have  they  proved?  Using  CertiKOS,  they  have  sueeessfully  built  multiple  eertified  OS 
kernels.  For  eaeh  sueh  kernel,  they  have  always  eonstrueted  its  deep  speeifieation  and  proved  its 
eontextual  funetional  eorreetness  property,  so  all  global  properties  proved  at  the  speeiheation  level 
ean  be  transferred  down  to  the  lowest  assembly  maehine. 

From  the  funetional  eorreetness  property,  they  immediately  derive  that  all  system  ealls  and  traps 
will  always  run  safely  and  also  terminate;  and  there  will  be  no  eode  injeetion  attaeks,  no  buffer 
overflows,  no  null  pointer  aeeess,  no  integer  overflow,  ete.  They  also  proved  that  there  is  no  staek 
overflow  or  memory  exhaustion  in  the  kernel  using  reeent  teehniques  developed  also  by  the  Pi’s 
team  et  al  [3,4].  They  have  also  proved  an  isolation  property  between  the  virtual  address  spaees  of 
user-level  proeesses.  All  of  these  properties  were  proved  using  the  abstraet  speeifieation  provided  at 
the  top  layer,  and  then  transferred  to  the  lowest  assembly  maehine  via  eontextual  refinement. 


Assumptions  and  limitations  Outside  their  eertified  mCertiKOS  kernel,  there  are  only  163  lines 
of  C  (for  loading  ELF  binaries)  and  38  lines  of  assembly  eode  (for  handling  traps)  that  are  not 
verified. 

The  mCertiKOS  kernel  also  relies  on  a  bootloader,  whose  verifleation  is  left  for  future  work. 
The  bottom-most  x86  layer  of  our  eertified  kernels  is  ealled  Prelnit,  whieh  initializes  the  drivers, 
e.g.,  serial,  disk,  eonsole,  etc.  Deviee  drivers  are  not  verified  beeause  our  eurrent  maehine  semanties 
laeks  deviee  models  for  expressing  the  eorresponding  semanties. 

Their  assembly-level  maehines  do  not  eover  the  full  x86  instruetion  sets,  so  their  eontextual 
eorreetness  results  only  apply  to  programs  in  this  subset.  However,  additional  instruetions  and 
features  ean  be  easily  added  if  they  have  simple  or  no  interaetion  with  our  kernel. 

The  CompCert  assembler  for  eonverting  assembly  into  maehine  eode  is  also  not  verified.  They 
assume  the  eorreetness  of  the  Coq  proof  eheeker  and  its  eode  extraetion  meehanism. 

Their  eurrent  eertified  kernels  assume  a  runtime  environment  eonsisting  of  a  single  proeessor, 
but  extending  it  to  support  multieore  eoneurreney  is  already  under  way.  Their  ehoiee  of  using 
eontextual  refinement  to  eompose  layers  is  motivated  partly  by  its  elose  eonneetion  [13,26]  with  the 
work  on  eoneurrent  objeets  [15, 16]. 

Like  most  existing  verified  kernel  efforts,  they  assume  that  interrupts  are  only  enabled  in 
user  or  guest  mode.  The  ehallenges  in  handling  interrupts  and  preemption  are  similar  to  those 
for  eoneurreney  [11, 12].  They  believe  that  similar  approaehes  ean  be  readily  supported  in  their 
CertiKOS  framework. 


Comparison  with  seL4  The  seL4  team  [19]  foeused  on  verifying  a  partieular  microkernel.  The 
designers  of  the  L4-family  kernels  [9,27]  advocated  the  minimality  principle:  a  concept  is  tolerated 
inside  the  microkernel  only  if  moving  it  outside  the  kernel  would  prevent  the  implementation 
of  the  system’s  required  functionality.  This  is  a  reasonable  principle  but  its  interpretation  of  the 
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“kernel-user”  boundary  (as  the  hardware-enforced  “red-line”)  is  quite  narrow.  The  PI  and  his  team’s 
new  CertiKOS  architecture  advocates  replacing  the  traditional  “red  line”  with  a  large  number  of 
certified  abstraction  layers  enforced  by  formal  specification  and  proofs;  hardware  mechanism  (such 
as  address  protection)  is  just  one  (quick)  way  of  ensuring  that  a  specific  process  will  not  violate  the 
invariants  required  by  a  particular  kernel  abstraction  layer. 

As  mentioned  in  Sec.  2,  the  seL4  team  only  proved  the  refinement  property  but  not  the  con¬ 
textual  refinement  property,  so  the  global  properties  (e.g.,  security  [30, 34])  proved  at  the  abstract 
specification  level  cannot  be  transferred  to  the  C-implementation  level.  The  root  cause  of  this 
problem  is  their  rather  simplistic  C-level  state  machine  which  they  used  to  verify  their  7500  lines  of 
C  code.  This  machine  is  too  high  level  to  model  several  key  OS  features  (e.g,  kernel  initialization, 
context  switches,  address  translation,  and  page-fault  handling).  Indeed,  these  features  happen  to 
coincide  with  the  unverified  C  and  assembly  code  in  their  kernel. 

Sewell  et  al.  [33]  used  translation  validation  to  build  a  refinement  proof  between  the  semantics 
of  the  verified  C  source  code  and  the  corresponding  binary  (compiled  by  GCC).  This  proof  is  not  as 
high  quality  as  the  rest  of  the  seL4  effort  because  it  was  not  done  in  a  proof  assistant  (thus  it  has  no 
machine-checkable  proof)  and  the  translation  validator  itself  still  has  not  been  verified. 

Even  with  this  work  by  Sewell  et  al.  [33],  the  previously  unverified  C  code  (1200  lines)  and 
assembly  code  (600  lines)  in  seL4  still  remain  unverified.  These  are  actually  quite  major  assumptions 
for  a  verified  kernel  because  they  include  the  correctness  of  context  switches,  kernel  initialization, 
address  translation,  and  linking  between  verified  C  and  assembly;  all  of  which  were  considered  as 
major  challenge  problems  by  many  researchers  working  in  this  field  [5, 11,31,32,41]. 

Using  CertiKOS,  the  PI  and  his  team  have  successfully  tackled  all  of  these  challenges:  context 
switches,  kernel  initialization,  address  translation,  and  page  fault  handling  are  all  certified.  All 
kernel  components  (in  C  and  assembly)  are  correctly  linked  together  to  form  a  complete  system  in 
an  assembly  machine  and  all  our  proofs  are  machine-checkable  in  Coq. 

Much  of  the  implementation  complexity  of  the  seL4  kernel  lies  on  its  support  of  capability-based 
access  control.  Capabilities  are  important  in  seL4  as  they  are  used  to  prevent  unwanted  interference 
between  different  kernel  components.  However,  they  significantly  increase  the  complexity  of  the 
seL4  kernel.  In  contrast,  the  CertiKOS-family  kernels  the  PI  and  his  team  have  built  so  far  rely  on 
the  CompCert  memory  model  [23]  to  enforce  isolation  and  prove  contextual  refinement. 

3.2  Methods  and  Procedures:  Defining  Abstraction  Layers 

Contextual  refinement  provides  an  elegant  formalism  for  decomposing  the  verification  of  a  complex 
kernel  into  a  large  number  of  tractable  tasks:  the  PI  and  his  team  define  a  series  of  logical  abstraction 
layers,  which  serve  as  increasingly  higher-level  specifications  for  an  increasing  portion  of  the 
kernel  code.  They  design  these  abstract  layers  in  a  way  such  that  complex  interdependent  kernel 
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components  are  untangled  and  converted  into  a  well-organized  kernel-object  stack  with  clean 
specification. 

Their  framework  specifies  an  abstraction  layer  using  five  components:  a  collection  of  objects,  a 
memory  model,  an  invariant  which  the  memory  and  objects  satisfy  at  any  point  of  the  execution,  an 
initialization  flag,  and  an  initialization  primitive.  These  five  components  define  a  logical  view  of  a 
subset  of  the  kernel  code  and  extend  our  language  with  an  abstract  specification  of  that  code.  On 
top  of  this  logical  view,  more  code  is  introduced  and  verified. 


Layer  objects  The  layer  objects  are  logical  abstractions  of  kernel  modules.  In  Fig.  3,  each  layer 
object  provides  a  set  of  abstract  states  (which  are  abstractions  of  the  module’s  private  memory)  and 
a  set  of  primitives  (which  are  abstractions  of  the  module’s  interface  specified  in  terms  of  the  abstract 
states).  Consecutive  layers  may  reuse  some  of  the  same  objects,  introduce  new  layer  objects  by 
verifying  additional  code,  or  hide  some  low-level  objects  which  are  used  to  implement  new  objects 
but  need  not  be  exposed  to  higher  layers.  Hiding  unnecessary  objects  facilitates  invariant  proofs 
since  they  can  often  use  stronger  invariants  at  higher  layers  that  would  otherwise  be  violated  by 
low-level  objects. 

For  example,  thread  queues  are  implemented  as  doubly-linked  lists  in  mCertiKOS,  and  the 
concrete  implementations  of  the  functions  that  manipulate  queues  {enqueue  and  dequeue)  directly 
manipulate  these  doubly-linked  lists  in  memory.  On  the  other  hand,  in  our  abstract  queue  layer 
object,  a  queue  is  just  a  simple  list  of  thread  identifiers,  and  the  enqueue  and  dequeue  primitives 
are  specified  directly  over  the  abstract  lists.  The  contextual  refinement  relation  between  the  two 
layers  (one  with  concrete  implementation  and  the  other  with  the  abstract  layer  object)  ensures 
that  any  kernel/user  context  code  (e.g.,  the  scheduler)  running  on  top  of  the  more  abstract  layer 
retains  an  equivalent  behavior  when  it  is  running  on  top  of  the  layer  with  corresponding  concrete 
implementation. 

As  shown  in  Fig.  3,  to  establish  the  contextual  refinement  relation  between  concrete  memory 
and  abstract  state,  they  use  CompCert  memory  permissions  [24]  at  the  higher  layer  to  prevent  the 
context  code  from  accessing  the  module’s  private  memory.  Note  that  these  permissions  do  not 
correspond  to  a  physical  protection  mechanism,  but  instead  are  entirely  logical:  they  ensure  that  the 
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Figure  4:  (a)  Machine  memory  model;  (b)  Abstract  memory  model 

higher- level  abstract  machine  gets  stuck  whenever  it  executes  code  that  directly  accesses  this  private 
memory.  By  proving  our  kernel  is  safe  (it  does  not  get  stuck),  they  guarantee  that  this  situation  will 
not  happen. 


Memory  models  OS  kernels  must  manage  limited  physical  memory  and  provide  contiguous 
address  spaces  for  high-level  kernel  modules  and  user  programs.  Because  much  of  the  code  assumes 
that  the  memory  management  sets  up  the  virtual  address  space  properly,  initialization  has  been 
a  sticking  point  in  previous  verification  efforts  [20,41],  in  which  the  virtual  address  space  setup 
is  either  not  verified,  or  verified  separately  as  an  external  lemma.  They  address  this  challenge  by 
making  the  memory  model  explicit  in  our  abstraction  layers. 

Because  they  use  CompCertX  [14]  along  with  its  formalization  of  the  semantics  of  C  and 
assembly,  our  notion  of  memory  is  based  on  the  CompCert  memory  model  [24].  CompCert  employs 
a  unified  model  to  encode  different  views  of  memory.  The  memory  is  split  into  a  number  of  disjoint 
blocks  and  a  pointer  is  represented  by  a  pair  (6,  o),  where  6  is  a  block  identifier  and  o  is  an  offset 
within  block  b.  Each  offset  within  a  block  is  associated  with  a  permission  specifying  the  memory 
operations  that  can  be  performed  at  that  location.  A  program  which  attempts  to  perform  a  prohibited 
operation  will  get  stuck.  The  compiler’s  correctness  theorem  guarantees  that  the  target  program  will 
only  get  stuck  if  the  source  does;  thus  the  compiler  will  never  introduce  invalid  memory  operations 
into  a  correct  program. 

In  CompCert,  this  unified  memory  model  built  around  blocks  and  permissions  is  used  to  encode 
different  views  of  the  memory.  For  instance,  at  the  C  level  each  variable  is  assigned  its  own  memory 
block,  so  that  the  semantics  of  CompCert  C  reflect  the  C  standard  by  invalidating  pointer  arithmetic 
across  variable  boundaries.  On  the  other  hand,  in  the  emitted  assembly  code,  a  function’s  local 
variables  are  merged  into  a  single  “stack  frame”  memory  block.  CompCert’s  simulation  proof  has 
to  keep  track  of  the  correspondence  between  these  two  views  of  the  memory,  but  the  fact  that  the 
semantics  of  the  source  and  target  languages  are  expressed  in  terms  of  a  unified  memory  framework 
tremendously  simplifies  the  compiler’s  verification.  At  the  assembly  level,  this  model  is  still  slightly 
more  abstract  than  the  hardware,  yet  it  is  sophisticated  enough  that  CompCert’s  stack  layout  pass, 
for  instance,  can  be  properly  verified. 

They  follow  a  similar  approach,  and  extend  the  semantics  of  CompCert  assembly  so  that  the 
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CompCert  memory  model  ean  be  equipped  with  notions  of  page  fault  and  address  translation.  A 
distinguished  bloek  is  used  to  represent  the  entire  address  spaee.  The  memory  model  of  a  layer  L 
speeihes  how  memory  loads  and  stores  are  earried  out  in  terms  of  the  system  deseription  at  that  level 
of  abstraetion.  The  maehine  memory  model,  and  those  implemented  by  the  physieal  and  virtual 
memory  management  eomponents,  organize  memory  in  terms  of  various  units  (byte,  page,  address 
spaee),  and  provide  different  addressing  modes  and  proteetion  meehanisms.  Beeause  our  kernel 
eode  is  eompiled  using  CompCertX,  its  own  staek  frames  and  statie  data  have  to  be  modeled  as 
independent  bloeks.  However,  as  explained  in  See.  4,  we  prove  that  user  programs  ean  never  aeeess 
the  kernel  portion  of  the  address  spaee.  They  also  use  an  external  tool  [3]  to  prove  that  the  staek 
usage  of  our  eompiled  kernel  is  bounded  sueh  that  staek  overflows  eannot  oeeur:  the  eomputed 
bound  is  mueh  less  than  the  dedieated  4K  bytes  we  use  for  kernel  staeks. 

Integrating  the  various  views  of  the  memory  into  our  layered  approaeh  allows  us  to  reason 
about  memory  aeeesses  in  the  same  way  that  we  reason  about  other  kernel  serviees:  as  long  as  the 
low-level  maehine  memory  model,  as  eonflgured  by  our  kernel  eode,  eontextually  refines  a  more 
abstraet  memory  model,  any  eode  we  ean  write  and  reason  about  in  terms  of  the  latter  ean  be  shown 
to  have  an  equivalent  behavior  when  run  on  top  of  the  former.  As  shown  in  Fig.  4(a),  the  machine 
memory  model  is  an  unstruetured  CompCert  memory  bloek,  whieh  is  eonsistent  with  the  hardware 
view  of  the  physieal  memory.  Aeeesses  to  this  memory  bloek  are  modeled  in  a  way  that  mirrors  the 
operation  of  the  paging  hardware.  By  eontrast,  in  the  top-level  memory  model  (whieh  we  eall  the 
abstract  memory  model),  address  translation  eannot  be  disabled;  memory  aeeessors  operate  on  the 
basis  of  the  high-level,  abstraet  deseriptions  of  address  spaees  rather  than  eonerete  page  direetories 
and  page  tables  stored  in  the  memory  itself  (see  Fig.  4(b)). 


Layer  invariant  Eaeh  abstraetion  layer  speeifles  a  predieate  on  the  memory  and  layer  objeets’ 
abstraet  states.  This  invariant  is  satisfied  by  the  initial  state  and  preserved  by  memory  aeeessors  and 
the  layer  objeets’  primitives.  It  therefore  holds  in  all  elient  eontexts,  at  any  point  of  exeeution. 

In  previous  verifieation  efforts,  proving  invariants  has  typieally  been  ehallenging.  For  example, 
in  seL4,  the  thread  queues  are  implemented  as  doubly-linked  lists  with  the  following  invariant: 

Invariant  1.  All  back  links  in  thread  queues  point  to  appropriate  nodes  and  all  elements  point  to 
thread  control  blocks. 

Proving  this  invariant  is  diffieult  for  several  reasons.  As  stated  in  [20]: 

Invariants  are  expensive  beeause  they  need  to  be  proved  not  only  loeally,  but  for  the 
whole  kernel  —  we  have  to  show  that  no  other  pointer  manipulation  in  the  kernel 
aeeidentally  destroys  the  list  or  its  properties.  [. . .  ]  The  treatment  of  globals  beeomes 
espeeially  diffieult  if  the  invariants  are  temporarily  violated.  For  example,  adding  a  new 
node  to  a  doubly-linked  list  temporarily  violates  invariants  that  the  list  is  well  formed. 


14 


However,  in  our  layered  approach,  global  variables  and  the  code  that  manipulates  them  are  abstracted 
as  layer  objects.  The  remaining  kernel  code  cannot  access  the  abstracted  variables  directly,  since 
they  are  hidden  using  CompCert  memory  permissions.  Moreover,  the  abstract  primitives  are  atomic, 
hence  there  is  no  longer  a  point  in  the  execution  at  which  the  invariants  have  to  be  temporarily 
violated.  Finally,  some  complex  invariants  are  implied  by  the  correspondence  with  our  abstract 
representations.  For  instance,  in  our  setting,  Inv.  1  naturally  follows  from  the  contextual  refinement 
between  concrete  thread  queues  and  abstract  “thread  list”  objects. 


Initialization  flag  and  primitive  Each  layer  has  exactly  one  initialization  primitive,  which  can 
be  viewed  as  a  special  layer  object  together  with  the  initialization  flag.  This  logical  initialization 
flag  h  false  in  the  initial  state  and  is  set  to  true  by  the  initialization  primitive.  Most  of  the  invariants 
and  specifications  of  non-initialization  primitives  require  as  a  precondition  that  the  initialization 
flag  is  true.  This  guarantees  that  the  initialization  primitive  is  the  first  primitive  that  is  executed. 


3.3  Methods  and  Procedures:  Introducing  Abstraction  Layers 

Introducing  new  layers  is  a  way  to  organize  code  and  lift  the  abstraction  level.  In  most  cases,  this 
does  not  require  modifying  the  implementation.  In  this  section,  the  PI  and  his  team  discuss  some  of 
the  principles  they  used  when  drawing  the  boundaries  of  their  kernel’s  abstraction  layers. 


Principle  1:  Introduce  layers  to  reflect  dependencies  between  kernel  modules  One  purpose 
of  layers  is  to  enforce  code  isolation  and  abstraction.  When  a  module  M  depends  on  another  module 
N,  abstraction  layers  should  be  organized  in  such  a  way  that  M  can  be  reasoned  about  in  terms  of 
an  abstracted  version  of  N. 

For  example,  since  the  virtual  memory  management  code  relies  on  physical  memory  manage¬ 
ment,  the  code  which  performs  allocation  and  deallocation  of  physical  pages  in  terms  of  allocation 
tables  is  first  abstracted  into  a  layer  object.  This  object  provides  the  primitives  palloc  and  pfree  and 
defines  their  abstract  specifications.  Then  functions  such  as  ptJnsert  and  pt^rmv,  which  manipulate 
page  mappings  at  the  virtual  memory  management  level,  can  be  verified  with  a  more  abstract 
view  of  the  allocation  table,  without  worrying  about  its  concrete  memory  representation  and  code 
implementation.  On  the  other  hand,  if  two  kernel  modules  mutually  depend  on  each  other,  they 
have  to  be  introduced  within  a  single  layer. 


Principle  2:  Introduce  a  layer  when  the  memory  model  changes  In  the  machine  memory 
model,  when  paging  is  enabled,  each  memory  access  is  accompanied  by  a  two  level  page  table  walk 
starting  from  the  address  stored  in  the  CR3  register,  shown  in  Fig.  4(a).  Switches  of  page  tables 
are  performed  by  storing  the  top  address  of  the  other  page  table  structure  into  CR3.  In  the  abstract 
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memory  model,  we  associate  with  each  process  a  logical  partial  map  from  a  virtual  address  to  a 
pair  of  physical  address  and  permission.  The  address  translations  are  performed  using  the  logical 
mappings  of  the  currently-running  process,  shown  in  Fig.  4(b).  With  this  high  level  memory  model, 
some  complex  properties  like  memory  isolation  can  be  proved  more  easily. 

mCertiKOS  uses  an  additional  intermediate  memory  model.  The  mCertiKOS-hyp  extension 
presented  in  Sec.  4  uses  yet  another,  virtualization-related  model.  They  introduce  a  new  layer 
whenever  we  switch  from  one  memory  model  to  another  and  establish  the  contextual  refinement 
between  them. 


Principle  3:  Introduce  a  layer  when  a  stronger  invariant  needs  to  be  proved  After  paging 
is  enabled,  both  kernel  modules  and  user  processes  run  in  a  virtual  address  space.  To  ensure  the 
correctness  of  these  kernel  modules  and  user  processes  on  top  of  virtual  memory  management,  we 
require  the  following  invariants  to  hold: 

Invariant  2.  1)  paging  is  enabled  only  after  the  initialization  of  virtual  memory  management;  2) 
the  memory  regions  that  store  kernel-specific  data  must  have  the  kernel-only  permission  in  all 
page  maps;  3)  the  page  map  used  by  the  kernel  is  an  identity  map  4)  the  non- shared  parts  of  user 
processes  ’  memory  are  isolated. 

Inv.  2  no  longer  holds  if  the  privileged  primitive  that  sets  the  CR3  register  is  present  in  the  layer, 
as  the  unknown  context  code  may  write  an  invalid  address  into  CR3  using  the  provided  primitive. 
To  solve  this  issue,  another  layer  is  introduced  with  a  wrapper  function  that  takes  the  process  id  as 
argument,  instead  of  an  actual  address.  Then  the  function  sets  CR3  to  the  starting  address  of  the 
predefined  corresponding  process’s  page  table  structure.  The  primitive  that  directly  sets  the  CR3 
register  is  hidden  from  the  new  layer,  and  the  invariants  are  introduced  in  the  new  layer.  This  is 
one  of  the  rare  cases  where  performance  overhead  is  introduced  (one  extra  function  call  due  to  the 
wrapper).  It  should  be  possible  to  use  CompCertX’s  function-inlining  optimization  to  remove  this 
overhead  (this  is  left  as  future  work). 


Principle  4:  Introduce  a  layer  to  facilitate  initialization  proofs  Recall  that  each  layer  contains 
one  initialization  primitive.  This  primitive  can  be  passed  through  from  the  layer  below,  or  a  new 
one  can  be  defined  which  extends  that  of  the  layer  below  so  as  to  initialize  the  new  layer’s  data. 
When  a  new  layer  object  is  introduced,  we  can  create  a  new  layer  to  initialize  its  abstract  data  to  an 
appropriate  state.  In  the  context  of  an  operating  system  kernel,  initialization  functions  are  relatively 
complex.  Introducing  an  extra  layer  allows  us  to  avoid  directly  reasoning  over  the  concrete  memory. 
With  this  new  layer,  an  initialization  function  is  verified  using  a  more  abstract  specification. 
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4  RESULTS  AND  DISCUSSION 

4.1  Certifying  the  mCertiKOS  Kernel 

In  this  section,  the  PI  and  his  team  describe  the  main  parts  of  the  certification  of  mCertiKOS.  The 
mCertiKOS  kernel  is  divided  into  four  main  components  (see  Fig.  5)  which  consist  of  multiple 
layers:  the  pre-initialization  module  (I  layer),  the  memory  management  (14  layers),  the  process 
management  (14  layers),  and  the  trap  handler  (4  layers).  The  pre-initialization  module  contains 
the  bottom  layer  that  corresponds  to  the  physical  machine  and  trap  handler  contains  the  top  layer 
provides  system  calls  and  serves  as  a  specification  of  the  whole  kernel.  Their  main  theorem  states 
that  context  code  that  is  understood  in  terms  of  the  topmost  abstraction  layer  has  an  equivalent 
behavior  when  run  along  with  the  kernel  on  the  bottom-most  layer. 

The  overall  structure  of  the  layered  certification  is  shown  in  Fig.  5.  Each  row  in  the  diagram 
describes  a  layer.  It  consists  of  the  name  of  the  layer  (on  the  very  left)  followed  by  the  initialization 
primitive  (green  background),  and  the  memory  model  used  by  the  layer  (blue  background).  The 
rest  of  the  row  describes  layer  objects,  each  in  their  own  bordered  rectangle.  Normal  white-filled 
objecfs  are  used  fo  implemenf  new  layers,  while  fhose  filled  wifh  gray  are  hidden  from  higher 
layers.  Some  objects  span  across  multiple  rows  and  are  colored  purple,  meaning  that  they  are 
horizontally  composed  to  implement  higher  layers.  The  objects  with  different  subscripts  indicate 
different  abstract  view  over  the  same  data. 


Pre-initialization  module  The  pre-initialization  module  only  contains  the  bottom-most  layer 
Prelnit.  It  is  used  to  model  the  x86  hardware  and  axiomatizes  the  hardware  behaviors  that  are 
necessary  to  obtain  end-to-end  behaviors  across  the  kernel  and  the  user  space.  These  behaviors 
include  page  table  walk  upon  memory  load  when  paging  is  turned  on,  saving  and  restoring  part  of 
the  trap  frame  in  the  case  of  interrupts,  and  switching  the  stack  pointer  in  the  case  of  ring  switch. 

The  x86  object  is  the  only  layer  object  in  the  Prelnit  layer.  It  extends  the  CompCert  assembly 
semantics  to  model  the  low-level  features  of  the  machine.  Its  abstract  state  consists  of  control 
registers,  a  physical  memory  map  MM,  and  a  kernel  mode  flag  ikern.  Its  primitives  consist  of 
getter-setter  functions  for  control  registers  and  MM,  and  a  function  models  the  transition  between 
user  and  kernel  mode. 

The  state  component  MM  is  the  abstraction  of  the  E820  memory  map  provided  by  the  bootloader. 
The  control  registers,  such  as  CRO,  CR2,  and  CR3,  are  used  to  model  the  behavior  of  the  processor’s 
memory  management  unit  (MMU).  When  paging  is  enabled  (as  indicated  by  CRO),  memory  accesses 
made  by  both  the  kernel  and  the  user  programs  are  translated  using  the  page  map  pointed  to  by  CR3 
in  the  machine  memory  model.  When  a  page  fault  occurs,  the  corresponding  information  is  stored  in 
CR2  and  the  page  fault  handler  is  invoked.  The  logical  flag  ikern  indicates  whether  the  processor  is 
currently  in  the  kernel  or  user  mode.  Some  privileged  memory  regions  (e.g.,  allocation  table,  page 
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Figure  5:  Layers  of  mCertiKOS 
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map)  and  instructions  (e.g.,  modifying  control  registers)  are  only  available  in  the  kernel  mode. 


Memory  management  The  memory  management  of  mCertiKOS  consists  of  the  physical  memory 
management  (4  layers),  virtual  memory  management  (7  layers)  and  shared  memory  management  (3 
layers). 

Based  on  the  pre-initialization  layer  and  the  machine  memory  model,  the  physical  memory 
management  abstracts  the  physical  page  allocation  table  into  page  objects.  To  better  reason  about 
access  control  and  isolation  in  the  case  of  the  dynamic  resource  allocation,  each  physical  page 
object  maintains  a  logical  state  containing  ownership  information,  and  the  page  is  only  allowed  to 
be  accessed  by  its  owners. 

On  top  of  physical  memory  management,  the  virtual  memory  management  provides  consecutive 
virtual  address  spaces.  They  proved  not  only  that  the  primitives  of  virtual  memory  management 
manipulate  the  address  space  correctly,  but  also  that  the  initialization  procedure  sets  up  the  two-level 
page  maps  properly  in  terms  of  hardware  address  translation.  The  Inv.  2  they  have  proved  guarantees 
that  it  is  safe  to  run  both  the  kernel  and  user  programs  in  the  virtual  address  space  when  paging  is 
enabled. 

The  shared  memory  management  provides  a  protocol  to  share  physical  pages  among  different 
user  processes.  It  provides  an  infrastructure  to  map  a  physical  page  into  multiple  processes’  page 
maps  in  different  address  spaces.  Their  ownership  mechanism  ensures  that  the  page  can  only  be 
freed  once  all  processes  release  ownership. 


Enforcing  memory  quotas  Another  function  of  the  physical  memory  management  is  to  dynami¬ 
cally  track  and  bound  the  memory  usage  (in  terms  of  number  of  dynamically-allocated  pages)  of 
processes  based  on  their  id. 

In  mCertiKOS,  they  consider  every  unique  integer  (up  to  some  predefined  maximum,  currently 
2^®)  to  represent  a  different  agent  or  principal.  They  refer  to  this  integer  as  the  agent’s  id,  and  they 
use  it  for  all  layer  objects  owned  by  that  agent. 

The  MContainer  layer  introduces  a  notion  of  container,  inspired  by  container  objects  in  the 
HiStar  operating  system  [42].  Whenever  a  new  agent  (id)  is  created  in  mCertiKOS,  a  container 
is  created  for  the  agent  that  dynamically  keeps  track  of  its  memory  usage.  An  agent’s  usage  may 
increase  for  a  few  reasons,  including  a  direct  request  for  dynamically-allocated  resources,  or  a 
successfully-handled  page  fault.  Each  container  object  is  initialized  with  some  maximum  quota', 
any  attempt  for  an  agent  to  increase  its  usage  beyond  this  quota  will  be  denied  by  the  kernel. 
Furthermore,  the  kernel  maintains  a  mapping  of  ids  to  containers  using  a  hierarchical  tree  structure. 
Whenever  an  agent’s  process  makes  a  request  to  spawn  a  new  process,  the  new  container  is  added 
as  a  child  to  the  requesting  agent’s  container,  and  the  new  container’s  quota  is  taken  from  the 
requester’s. 
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With  this  notion  of  container,  they  are  able  to  prove  a  theorem  about  reliability  of  dynamic 
memory  allocation:  agents’  requests  for  additional  resources  will  always  be  fulfilled  as  long  as 
their  quota  is  not  exceeded.  Furthermore,  from  the  viewpoint  of  information-flow  security,  resource 
quotas  close  the  potential  for  two  different  processes  to  communicate  via  allocation  requests.  Hence 
quota  enforcement  provides  an  additional  level  of  security  for  mCertiKOS.  They  plan  to  extend  the 
concept  of  containers  to  other  types  of  resources  in  the  future.  For  example,  they  could  maintain 
a  time-slice  quota  for  each  agent.  This  would  provide  a  foundation  for  reasoning  about  liveness 
properties  for  processes  and  security  breaches  via  timing  channels. 


Process  management  Process  management  depends  on  virtual  address  spaces  and  introduces  the 
thread  and  proc  objects  as  the  abstractions  of  threads  and  processes,  respectively.  One  interesting 
aspect  of  the  process  management  component  is  the  context  switch  function.  This  assembly  function 
saves  the  register  set  of  the  current  thread  and  restores  the  register  set  from  the  kernel  context  of 
another  thread.  Since  the  instruction  pointer  register  {EIP)  and  stack  pointer  register  {ESP)  are  saved 
and  restored  in  this  procedure,  they  can  show  that  this  function  reflects  the  C-level  behavior  and 
restores  the  continuation  of  a  thread’s  execution.  Even  though  this  kernel  context  switch  function  is 
verified  at  assembly  level,  they  prove  that  it  will  not  violate  the  convention  of  ClightX  execution. 
This  enables  us  to  link  it  with  other  code  that  is  verified  at  C-level  and  compiled  by  CompCertX. 

In  the  process  management  component,  they  have  also  implemented  and  verified  a  single-copy 
synchronous  inter-process  communication  (IPC)  protocol.  Additionally,  they  have  verified  an  asyn¬ 
chronous  zero-copy  IPC  implementation  that  is  built  on  top  of  their  shared  memory  infrastructure. 


Trap  module  The  trap  module  specifies  the  behaviors  of  exception  handlers  and  mCertiKOS 
system  calls.  In  mCertiKOS,  exception  handlers  are  registered  in  a  table  of  first-class  code  pointers. 
When  an  exception  triggers  (via  interrupt),  the  kernel  consults  this  table  and  invokes  the  corre¬ 
sponding  exception  handler.  For  example,  a  page  fault  at  the  user  level  traps  into  the  kernel.  The 
page  fault  handler  then  reserves  a  page  for  PFLA  (if  necessary)  and  returns  to  the  user  level.  The 
verification  of  the  page  fault  handler  depends  on  layer  objects  introduced  at  different  abstraction 
levels  (see  Fig.  6).  Therefore,  the  behavior  of  the  page  fault  handler  is  interpreted  by  the  concrete 
first-class  code  pointer  until  all  the  dependent  layer  objects  are  introduced.  Then  the  handler  code  is 
verified  and  the  behavior  is  interpreted  using  its  abstract  atomic  specification. 

To  further  simplify  the  reasoning  about  user  code,  they  have  implemented  and  verified  the 
user  level  system  call  libraries  directly  in  the  user  space.  Since  their  machine  semantics  models 
hardware  behaviors  like  paging  and  ring  switch,  the  specifications  of  user  system  call  libraries 
closely  corresponds  to  the  real  execution  model  in  the  actual  hardware.  With  this  atomic  system 
call  semantics  in  the  user  level,  the  user  code  can  be  proved  much  more  easily. 
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Figure  6:  Call  graph  of  page  fault  handler 


4.2  Extensions  and  Adaptation 

One  primary  advantage  of  the  PI  and  his  team’s  new  extensible  arehiteeture  is  that  it  makes  eertihed 
kernel  extension  and  reasoning  mueh  easier  and  more  prineipled.  In  this  seetion,  they  deseribe  three 
alternative  mCertiKOS  kernels  that  they  ereated  through  relatively  minor  ehanges  to  the  base  kernel. 
They  then  present  a  speeifie  example  of  global  reasoning  over  the  mCertiKOS  kernel  —  a  simple 
notion  of  address  space  isolation  that  will  serve  as  a  starting  point  for  a  full-fledged  security  proof 
in  the  future. 


mCertiKOS-hyp:  supporting  virtualization  They  also  augmented  mCertiKOS  to  support  the 
two  hardware-assisted  virtualization  technologies  Intel  VT-x  and  AMD  SVM,  and  built  a  certified 
hypervisor  mCertiKOS-hyp. 

Fig.  7  shows  the  7  layers  of  the  virtual  machine  management  of  mCertiKOS-hyp  on  the  Intel 
platform.  VMInfo  is  the  layer  object  that  axiomatizes  some  of  the  hardware  speciflc  features  needed 
for  the  virtualization  support.  Since  it  is  orthogonal  to  memory  and  process  management,  the 
VMInfo  object  can  be  horizontally  composed  with  the  layers  below  PProc  in  mCertiKOS.  On  top  of 
this  extended  PProc  layer,  the  virtual  machine  management  extends  the  abstract  memory  model  with 
the  notions  of  Extended  Page  Table  (EPT),  the  virtual  machine  control  structure  (VMCS),  and  the 
virtual  machine  extension  meta  data  (VMX),  which  are  abstracted  into  corresponding  layer  objects. 
These  objects  are  again  orthogonal  to  the  trap  module  above  and  can  be  horizontally  composed  to 
export  related  system  calls  with  minimal  cost. 
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Figure  7:  Layers  of  virtual  maehine  management 

mCertiKOS-rz:  supporting  ring  0  processes  Thanks  to  the  eontextual  refinement  relation  they 
have  proved  for  mCertiKOS,  one  ean  eertify  user  programs  using  their  formal  speeifieations  of 
system  ealls.  This  gives  end-to-end  proofs  on  the  behaviors  of  user  programs  when  they  run  on 
mCertiKOS.  Furthermore,  once  certified,  these  processes  can  safely  run  in  the  privileged  ring  0 
mode.  They  extended  mCertiKOS  into  mCertiKOS-rz  by  adding  support  for  spawning  “in-kernel 
processes”  that  run  in  the  privileged  ring  0  mode.  Ring  0  processes  get  much  better  system  call 
performance  by  directly  calling  kernel  functions  and  avoiding  ring  switch  and  interrupt  processing. 

To  introduce  ring  0  processes  to  mCertiKOS,  they  added  a  single  layer  on  top  of  the  existing 
process  management  module:  Spawning  a  ring  0  process  sets  the  initial  ESP  register  to  a  preallocated 
memory  region  and  then  spawns  a  proper  kernel  thread.  The  memory  region  must  be  verifiably 
sufficient  for  the  entire  execution  of  the  process. 


mCertiKOS-emb:  embedded  systems  The  mCertiKOS-emb  kernel  is  intended  for  embedded 
settings.  To  develop  this  kernel  they  started  with  mCertiKOS-rz  and  removed  the  virtual  machine 
management,  the  virtual  memory  management,  and  some  of  the  process  management  layers  that  are 
related  to  user  contexts  and  user  process  management.  Thus  mCertiKOS-emb  only  supports  ring  0 
processes  which  run  directly  inside  the  physical  kernel  address  space  instead  of  the  user-level  paged 
virtual  address  space. 

Removing  plug-ins  or  layers  does  not  take  much  effort.  They  only  need  to  alter  the  contextual 
refinement  proof  at  the  boundary  so  they  can  glue  them  back  together. 


Isolation  in  mCertiKOS  They  have  begun  exploring  the  verification  of  a  global  security  property 
on  top  of  mCertiKOS.  As  a  starting  point,  they  proved  a  basic  notion  of  isolation  between  user- level 
processes  running  in  different  virtual  address  spaces.  This  isolation  property  is  composed  of  two 
theorems:  one  regarding  integrity  (write  protection),  and  another  regarding  confidentiality  (read 
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Figure  8:  Performance  evaluation  with  micro  benchmarks. 

protection,  or  noninterference).  The  statements  of  these  two  theorems  are  as  follows:  suppose  the 
top  layer  abstract  machine  takes  one  step,  changing  the  machine  state  from  S  to  S',  and  let  p  be  the 
id  of  the  currently -running  process  (which  can  be  found  in  S). 

Integrity:  If  the  value  at  some  non-kernel  memory  location  I  differs  between  S  and  S',  then  I 
belongs  to  a  page  that  is  mapped  in  the  virtual  address  space  of  p. 

Confidentiality:  If  the  step  taken  is  not  a  primitive  call  to  an  IPC  syscall  (send,  recv,  etc.),  then  the 
values  of  memory  in  any  address  space  other  than  p’s  cannot  have  an  effect  on  the  result  of 
the  step.  In  other  words,  if  they  altered  S  by  changing  data  in  a  different  process’s  address 
space,  the  step  would  still  have  the  same  effect  on  p’s  address  space. 

In  the  future,  they  plan  to  provide  a  more  detailed  security  policy  by  describing  what  can  happen 
to  confidentiality  when  IPC  is  used.  This  description  will  be  expressed  in  terms  of  propagation  of 
security  labels  on  the  IPC  data.  Note,  however,  that  their  framework  allows  for  security  labels  to  be 
specified  at  a  purely  logical  level  —  there  is  no  need  for  concrete  representation  and  manipulation 
of  labels  at  run  time. 

Noninterference  properties  are  generally  not  preserved  across  refinement  due  to  nondeterminism. 
It  may  therefore  seem  that  the  aforementioned  confidentiality  holds  only  at  the  topmost  layer, 
but  not  at  lower  layers.  It  turns  out,  however,  that  their  notion  of  deep  specification  is  strong 
enough  to  preserve  noninterference.  Essentially,  to  give  a  deep  specification  to  a  nondeterministic 
semantics,  they  must  first  externalize  the  source  of  nondeterminism  (e.g.,  into  an  oracle).  The 
noninterference  property  then  becomes  parameterized  over  this  source  of  nondeterminism,  which 
allows  the  parameterized  property  to  be  preserved  across  refinement.  This  relationship  between 
deep  specification,  noninterference,  and  refinement  will  be  explored  comprehensively  in  future 
work. 
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4.3  Performance  Evaluation  and  Proof  Effort 


The  PI  and  his  team  have  also  analyzed  the  performanee  of  the  mCertiKOS-hyp  hypervisor  ker¬ 
nel  with  a  thorough  experimental  benehmark  evaluation.  Furthermore,  an  extended  version  of 
mCertiKOS-hyp  was  deployed  in  a  praetieal  system  that  is  used  in  the  eontext  of  another  related 
researeh  projeet  funded  by  the  DARPA  HACMS  program.  Their  experiments  with  benehmarks 
eonfirm  the  observations  made  during  deployment:  the  performanee  overhead  of  mCertiKOS-hyp 
is  moderate.  They  are  eonvineed  that  it  is  praetieal  to  use  their  verifieation  framework  to  produee 
eompetitive  real-world  kernels  with  aeeeptable  effort. 


Performance  evaluation  They  used  a  number  of  miero  and  maero  benehmarks  to  measure  the 
overhead  of  mCertiKOS-hyp  and  to  eompare  mCertiKOS-hyp  to  existing  systems  sueh  as  KVM 
and  seL4.  All  experiments  have  been  performed  on  an  Intel  Core  i7-2600  S  with  8  MB  L3  eaehe, 
16  GB  memory,  and  a  120  GB  Intel  520  SSD.  Sinee  the  power  eontrol  eode  has  not  been  verified, 
they  disabled  the  turbo  boost  and  power  management  features  of  the  hardware  during  experiments. 

A  eomparison  of  the  performanee  of  seL4  and  mCertiKOS-hyp  is  not  straightforward  sinee 
the  mCertiKOS  kernels  run  on  x86  platforms  but  the  verified  seL4  runs  on  ARMv6  and  ARNvV 
hardware.  Moreover,  the  verified  version  of  seL4  does  not  have  virtualization  support  and  eannot 
boot  Linux.  As  a  result,  they  do  not  eompare  hypervisor  performanee  but  instead  foeus  on  a 
eomparison  of  the  IPC  performanee  of  mCertiKOS-hyp  and  an  unverified  x86  version  of  seL4. 


IPC  Performance  They  eompared  IPC  in  mCertiKOS-hyp  and  the  (unverified)  x86  version  of 
seL4.  They  used  seL4’s  IPC  benehmark  sel4beneh-manifest^  with  proeesses  in  different  address 
spaees  and  with  identieal  seheduler  priorities,  both  in  slowpath  and  fastpath  eonfigurations.  To 
run  this  benehmark  on  mCertiKOS-hyp,  they  replaeed  seL4’s  Call  and  ReplyWait  operations 
with  mCertiKOS-hyp  synehronous  send  and  receive  operations.  Fig.  8  (on  the  right)  eontains  a 
eompilation  of  their  results.  It  shows  the  average  number  of  eloek  eyeles  needed  for  the  operations 
for  message  sizes  0  and  1000. 

Beeause  seL4  follows  the  microkernel  design  philosophy,  its  IPC  performance  is  critical.  IPC 
implementations  in  seL4  are  highly  optimized,  and  heavily  tailored  to  specific  hardware  platforms. 
While  this  degree  of  optimization  gives  seL4  an  advantage  in  IPC  intensive  systems,  they  currently 
do  not  see  the  need  to  improve  IPC  performance  in  mCertiKOS-hyp  for  application  scenarios  of  the 
kernel  that  they  have  in  mind. 


Hypervisor  Performance  To  evaluate  mCertiKOS-hyp  as  a  hypervisor,  they  measured  the  per¬ 
formance  of  micro  and  macro  benchmarks  on  Ubuntu  12.04.2  LTS  running  as  a  guest. 

'https://github.com/smaccm/sel4bench-manifest 
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Figure  9:  Normalized  macro  benchmarks:  Linux  on  KVM  and  mCTOS,  baseline  is  Linux  on  bare 
metal 

Fig.  9  contains  a  compilation  of  standard  macro  benchmarks:  unpacking  of  the  Linux  4.0-rc4 
kernel,  compilation  of  the  Linux  4.0-rc4  kernel,  and  Apache  HTTPerf.  They  ran  the  benchmarks  on 
Linux  as  guest  in  KVM  and  mCertiKOS-hyp,  as  well  as  on  bare  metal.  In  Fig.  9  they  normalized  the 
run  times  of  the  benchmarks  using  the  bare  metal  performance  as  a  baseline  (100%).  The  overhead 
of  mCertiKOS-hyp  is  moderate  and  comparable  to  KVM.  They  attribute  the  larger  overhead  for 
decompression  to  their  unverified  SSD  driver  that  still  contains  performance  bugs  (compare  disk 
dump  in  Fig.  8). 

Fig.  8  (on  the  left)  shows  a  compilation  of  micro  benchmarks  from  the  LMbench  benchmark 
suite  [29].  They  measure  the  performance  of  the  file  system,  some  local  communication  systems, 
virtual  memory,  context  switch  and,  for  sanity  checking,  basic  arithmetic  operations.  On  the  x-axes 
of  the  plots  are  the  names  of  the  respective  LMbench  benchmarks.  The  y-axes  of  the  two  plots 
at  the  top  left  show  the  run  time  in  nanoseconds  and  microseconds,  respectively.  The  other  three 
y-axes  show  the  throughput  in  MB/s. 

In  many  cases,  the  performance  of  mCertiKOS-hyp  is  in  between  bare  metal  and  KVM  (Kernel 
Virtual  Machine).  However,  there  are  still  some  rough  edges  in  the  results  that  they  mostly  attribute 
to  performance  problems  with  their  unverified  SSD  driver.  This  is  indicated  for  instance  by  the 
disk  dump  benchmark  in  which  the  transfer  rate  seems  to  remain  constant  as  the  data  size  increases. 
They  are  currently  investigating  the  issue. 

The  virtualization  drivers  in  mCertiKOS-hyp  are  running  in  a  user  process  in  the  ring  3  mode. 
This  approach  makes  the  kernel  smaller  and  makes  it  possible  to  use  an  unverified  driver.  The 
downside  of  this  approach  is  that  each  VM  entry  and  exit  causes  an  additional  ring  switch,  and 
VM-related  information  must  be  copied  to  the  user  driver  process  in  order  for  it  to  process  the 
exit.  Therefore,  it  may  have  an  impact  on  performance,  especially  for  those  guest  programs  that 
frequently  cause  VM  exits,  such  as  web  servers,  which  generate  frequent  network-related  external 
interrupts.  Another  approach  is  to  verify  the  drivers  and  run  them  inside  a  kernel  module,  e.g.,  in  a 


25 


ring  0  process. 


Proof  effort  The  PI  and  his  team  completed  the  verification  of  mCertiKOS-hyp  in  less  than  18 
person  months  (pm).  The  layer  design  and  verification  took  about  3  pm  for  the  physical  memory 
management  (4  layers),  3.5  pm  for  the  virtual  memory  module  (7  layers),  1  pm  for  the  shared 
memory  infrastructure  (3  layers),  3.5  pm  for  the  thread  management  (10  layers),  1  pm  for  the 
process  management  (4  layers),  1.5  pm  for  the  trap  handler  module  (4  layers),  1.5  pm  for  the  AMD 
SVM  virtualization  (9  layers),  and  2  pm  for  the  Intel  VT-x  virtualization  support  (7  layers).  In  total 
the  verified  mCertiKOS-hyp  kernel  consists  of  5500  lines  of  C  and  x86  assembly  code. 

The  verification  effort  roughly  falls  into  three  categories:  layer  design  with  specification  and 
invariants,  refinement  proofs  between  the  layers,  and  verification  of  C  and  assembly  code  with 
respect  to  the  specifications.  The  time  needed  for  each  of  the  categories  depends  largely  on  the  layer. 
For  instance,  at  the  boundary  of  physical  and  virtual  memory  management  (MPTIntro),  almost 
all  effort  is  in  the  refinement  proof,  due  to  the  proof  for  the  refinement  between  two  completely 
different  memory  models.  More  effort  went  into  the  refinement  proof  when  they  introduced  the 
Intel  virtual  machine  memory  model,  where  they  proved  the  refinement  between  the  concrete  four 
level  extended  page  table  structure  in  memory  and  the  abstract  mapping  from  the  guest  addresses  to 
the  host  addresses.  In  contrast,  for  the  layer  MATOp,  which  initializes  physical  memory  allocation, 
most  of  the  time  was  spent  on  verifying  the  non-trivial  nested  loops  present  in  the  C  code,  while  the 
refinement  proofs  were  derived  automatically. 

The  proofs  were  facilitated  by  automation  tools  for  C  code,  layer  design  patterns,  and  tactics 
libraries  developed  in  recent  years  [14].  These  tools  have  greatly  reduced  the  amount  of  work 
needed  to  verify  extensions  of  the  kernel. 


4.4  Other  Important  Results 

In  addition  to  developing  new  cutting-edge  technologies  for  building  certified  OS  kernels,  the  PI 
and  his  team  have  also  obtained  the  following  important  results.  We  annotate  each  technology 
with  a  publication  venue  where  the  main  result  is  first  published.  Here,  POPL  refers  to  “ACM 
SIGPLAN-SIGACT  Annual  Symposium  on  Principles  of  Programming  Languages;”  PLDI  refers  to 
“ACM  SIGPLAN  Conference  on  Programming  Language  Design  and  Implementation;”  ESOP  refers 
to  “European  Symposium  on  Programming;”  APLAS  refers  to  “Asian  Symposium  on  Programming 
Languages  and  Systems;”  CPP  refers  to  “International  Conference  on  Certified  Programs  and 
Proofs;”  Lies  refers  to  “IEEE  International  Conference  on  Logic  in  Computer  Science;”  POST 
refers  to  “International  Conference  on  Principles  of  Security  and  Trust;”  CONCUR  refers  to 
“International  Conference  on  Concurrency  Theory.”  All  papers  referenced  here  are  attached  in  the 
Appendix. 
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Deep  specifications  and  certified  abstraction  layers  (POPL’15)  Modem  computer  systems 
consist  of  a  multitude  of  abstraction  layers  (e.g.,  OS  kernels,  hypervisors,  device  drivers,  network 
protocols),  each  of  which  defines  an  interface  that  hides  the  implementation  details  of  a  particular 
set  of  functionality.  Client  programs  built  on  top  of  each  layer  can  be  understood  solely  based  on  the 
interface,  independent  of  the  layer  implementation.  Despite  their  obvious  importance,  abstraction 
layers  have  mostly  been  treated  as  a  system  concept;  they  have  almost  never  been  formally  specified 
or  verified.  This  makes  it  difficult  to  establish  strong  correctness  properties,  and  to  scale  program 
verification  across  multiple  layers. 

In  this  work,  the  PI  and  his  team  present  a  novel  language-based  account  of  abstraction  layers 
and  show  that  they  correspond  to  a  strong  form  of  abstraction  over  a  particularly  rich  class  of 
specifications  which  they  call  deep  specifications.  Just  as  data  abstraction  in  typed  functional 
languages  leads  to  the  important  representation  independence  property,  abstraction  over  deep 
specification  is  characterized  by  an  important  implementation  independence  property:  any  two 
implementations  of  the  same  deep  specification  must  have  contextually  equivalent  behaviors.  They 
present  a  new  layer  calculus  showing  how  to  formally  specify,  program,  verify,  and  compose 
abstraction  layers.  They  show  how  to  instantiate  the  layer  calculus  in  realistic  programming 
languages  such  as  C  and  assembly,  and  how  to  adapt  the  CompCert  verified  compiler  to  compile 
certified  C  layers  such  that  they  can  be  linked  with  assembly  layers.  Using  these  new  languages  and 
tools,  they  have  successfully  developed  multiple  certified  OS  kernels  in  the  Coq  proof  assistant. 


Compositional  Certified  Resource  Bounds  (PLDI’15)  In  this  work,  the  PI  and  his  team  de¬ 
veloped  a  new  approach  for  automatically  deriving  worst-case  resource  bounds  for  C  programs. 
The  described  technique  combines  ideas  from  amortized  analysis  and  abstract  interpretation  in 
a  unified  framework  to  address  four  challenges  for  state-of-the-art  techniques:  compositionality, 
user  interaction,  generation  of  proof  certificates,  and  scalability.  Compositionality  is  achieved 
by  incorporating  the  potential  method  of  amortized  analysis.  It  enables  the  derivation  of  global 
whole-program  bounds  with  local  derivation  rules  by  naturally  tracking  size  changes  of  variables  in 
sequenced  loops  and  function  calls.  The  resource  consumption  of  functions  is  described  abstractly 
and  a  function  call  can  be  analyzed  without  access  to  the  function  body.  User  interaction  is  sup¬ 
ported  with  a  new  mechanism  that  clearly  separates  qualitative  and  quantitative  verification.  A 
user  can  guide  the  analysis  to  derive  complex  non-linear  bounds  by  using  auxiliary  variables  and 
assertions.  The  assertions  are  separately  proved  using  established  qualitative  techniques  such  as 
abstract  interpretation  or  Hoare  logic.  Proof  certificates  are  automatically  generated  from  the  local 
derivation  rules.  A  soundness  proof  of  the  derivation  system  with  respect  to  a  formal  cost  semantics 
guarantees  the  validity  of  the  certificates.  Scalability  is  attained  by  an  efficient  reduction  of  bound 
inference  to  a  linear  optimization  problem  that  can  be  solved  by  off-the-shelf  LP  solvers.  The 
analysis  framework  is  implemented  in  the  publicly-available  tool  C4B.  An  experimental  evaluation 
demonstrates  the  advantages  of  the  new  technique  with  a  comparison  of  C4B  with  existing  tools  on 
challenging  micro  benchmarks  and  the  analysis  of  more  than  2900  lines  of  C  code  from  the  cBench 
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benchmark  suite. 


Automatic  Static  Cost  Analysis  for  Parallel  Programs  (ESOP’15)  Static  analysis  of  the  eval¬ 
uation  cost  of  programs  is  an  extensively  studied  problem  that  has  many  important  applications. 
However,  most  automatic  methods  for  static  cost  analysis  are  limited  to  sequential  evaluation  while 
programs  are  increasingly  evaluated  on  modem  multicore  and  multiprocessor  hardware.  This  work 
introduces  the  first  automatic  analysis  for  deriving  bounds  on  the  worst-case  evaluation  cost  of 
parallel  first-order  functional  programs.  The  analysis  is  performed  by  a  novel  type  system  for 
amortized  resource  analysis.  The  main  innovation  is  a  technique  that  separates  the  reasoning  about 
sizes  of  data  structures  and  evaluation  cost  within  the  same  framework.  The  cost  semantics  of 
parallel  programs  is  based  on  call-by-value  evaluation  and  the  standard  cost  measures  work  and 
depth.  A  soundness  proof  of  the  type  system  establishes  the  correctness  of  the  derived  cost  bounds 
with  respect  to  the  cost  semantics.  The  derived  bounds  are  multivariate  resource  polynomials 
which  depend  on  the  sizes  of  the  arguments  of  a  function.  Type  inference  can  be  reduced  to  linear 
programming  and  is  fully  automatic.  A  prototype  implementation  of  the  analysis  system  has  been 
developed  to  experimentally  evaluate  the  effectiveness  of  the  approach.  The  experiments  show 
that  the  analysis  infers  bounds  for  realistic  example  programs  such  as  quick  sort  for  lists  of  lists, 
matrix  multiplication,  and  an  implementation  of  sets  with  lists.  The  derived  bounds  are  often 
asymptotically  tight  and  the  constant  factors  are  close  to  the  optimal  ones. 


A  Compositional  Semantics  for  Verified  Separate  Compilation  and  Linking  (CPP’15)  Re¬ 
cent  ground-breaking  efforts  such  as  CompCert  have  made  a  convincing  case  that  mechanized 
verification  of  the  compiler  correctness  for  realistic  C  programs  is  both  viable  and  practical.  Unfor¬ 
tunately,  existing  verified  compilers  can  only  handle  whole  programs — this  severely  limits  their 
applicability  and  prevents  the  linking  of  verified  C  programs  with  verified  external  libraries.  In  this 
work,  the  PI  and  his  team  present  a  novel  compositional  semantics  for  reasoning  about  open  modules 
and  for  supporting  verified  separate  compilation  and  linking.  More  specifically,  they  replace  external 
function  calls  with  explicit  events  in  the  behavioral  semantics.  They  then  develop  a  verified  linking 
operator  that  makes  lazy  substitutions  on  (potentially  reacting)  behaviors  by  replacing  each  external 
function  call  event  with  a  behavior  simulating  the  requested  function.  Finally,  they  show  how 
our  new  semantics  can  be  applied  to  build  a  refinement  infrastructure  that  supports  both  vertical 
composition  and  horizontal  composition. 


Compositional  Verification  of  Termination-Preserving  Refinement  of  Concurrent  Programs 
(LICS’14)  Many  verification  problems  can  be  reduced  to  refinement  verification.  However, 
existing  work  on  verifying  refinement  of  concurrent  programs  either  fails  to  prove  the  preservation 
of  termination,  allowing  a  diverging  program  to  trivially  refine  any  programs,  or  is  difficult  to 
apply  in  compositional  thread-local  reasoning.  In  this  work,  the  PI  and  his  colleague  at  USTC  first 
propose  a  new  simulation  technique,  which  establishes  termination-preserving  refinement  and  is  a 
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congruence  with  respect  to  parallel  composition.  Then  they  give  a  proof  theory  for  the  simulation, 
which  is  the  first  Hoare-style  concurrent  program  logic  supporting  termination-preserving  refinement 
proofs.  They  show  two  key  applications  of  our  logic,  i.e.,  verifying  linearizability  and  lock-freedom 
together  for  fine-grained  concurrent  objects,  and  verifying  full  correctness  of  optimizations  of 
concurrent  algorithms. 


End-to-End  Verification  of  Stack-Space  Bounds  for  C  Programs  (PLDI’14)  Verified  compil¬ 
ers  guarantee  the  preservation  of  semantic  properties  and  thus  enable  formal  verification  of  programs 
at  the  source  level.  However,  important  quantitative  properties  such  as  memory  and  time  usage 
still  have  to  be  verified  at  the  machine  level  where  interactive  proofs  tend  to  be  more  tedious  and 
automation  is  more  challenging.  In  this  work,  the  PI  and  his  team  develop  a  new  framework  that 
enables  the  formal  verification  of  stack-space  bounds  of  compiled  machine  code  at  the  C  level.  It 
consists  of  a  verified  CompCert-based  compiler  that  preserves  quantitative  properties,  a  verified 
quantitative  program  logic  for  interactive  stack-bound  development,  and  a  verified  stack  analyzer 
that  automatically  derives  stack  bounds  during  compilation. 

The  framework  is  based  on  event  traces  that  record  function  calls  and  returns.  The  source 
language  is  CompCert  Clight  and  the  target  language  is  x86  assembly.  The  compiler  is  implemented 
in  the  Coq  Proof  Assistant  and  it  is  proved  that  crucial  properties  of  event  traces  are  preserved 
during  compilation.  A  novel  quantitative  Hoare  logic  is  developed  to  verify  stack-space  bounds  at 
the  CompCert  Clight  level.  The  quantitative  logic  is  implemented  in  Coq  and  proved  sound  with 
respect  to  event  traces  generated  by  the  small-step  semantics  of  CompCert  Clight.  Stack-space 
bounds  can  be  proved  at  the  source  level  without  taking  into  account  low-level  details  that  depend  on 
the  implementation  of  the  compiler.  The  compiler  fills  in  these  low-level  details  during  compilation 
and  generates  a  concrete  stack-space  bound  that  applies  to  the  produced  machine  code.  The  verified 
stack  analyzer  is  guaranteed  to  automatically  derive  bounds  for  code  with  non-recursive  functions. 
It  generates  a  derivation  in  the  quantitative  logic  to  ensure  soundness  as  well  as  interoperability 
with  interactively  developed  stack  bounds.  In  an  experimental  evaluation,  the  developed  framework 
is  used  to  obtain  verified  stack-space  bounds  for  micro  benchmarks  as  well  as  real  system  code.  The 
examples  include  the  verified  operating-system  kernel  CertiKOS,  parts  of  the  MiBench  embedded 
benchmark  suite,  and  programs  from  the  CompCert  benchmarks.  The  derived  bounds  are  close  to 
the  measured  stack-space  usage  of  executions  of  the  compiled  programs  on  a  Linux  x86  system. 


A  Separation  Logic  for  Enforcing  Declarative  Information  Flow  Control  Policies  (POST’14) 

In  this  work,  the  PI  and  his  student  develop  a  new  program  logic  for  proving  that  a  program  does 
not  release  information  about  sensitive  data  in  an  unintended  way.  The  most  important  feature  of 
the  logic  is  that  it  provides  a  formal  security  guarantee  while  supporting  ’’declassification  policies” 
that  describe  precise  conditions  under  which  a  piece  of  sensitive  data  can  be  released.  They  leverage 
the  power  of  Hoare  Logic  to  express  the  policies  and  security  guarantee  in  terms  of  state  predicates. 
This  allows  their  system  to  be  far  more  specific  regarding  declassification  conditions  than  most 
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other  information  flow  systems.  The  logie  is  designed  for  reasoning  about  a  C-like,  imperative 
language  with  pointer  manipulation  and  aliasing.  They  therefore  make  use  of  ideas  from  Separation 
Logie  to  reason  about  data  in  the  heap. 


Characterizing  Progress  Properties  of  Concurrent  Objects  via  Contextual  Refinements  (CON¬ 
CUR’ 13)  Implementations  of  eoneurrent  objeets  should  guarantee  linearizability  and  a  progress 
property  sueh  as  wait- freedom,  loek-freedom,  obstruetion- freedom,  starvation-freedom,  or  deadloek- 
freedom.  Conventional  informal  or  semi-formal  definitions  of  these  progress  properties  deseribe 
eonditions  under  whieh  a  method  eall  is  guaranteed  to  eomplete,  but  it  is  unelear  how  these  de¬ 
motions  ean  be  utilized  to  formally  verify  system  software  in  a  layered  and  modular  way.  In  this 
work,  the  PI  and  his  team  propose  a  unified  framework  based  on  eontextual  refinements  to  show 
exaetly  how  progress  properties  aeet  the  behaviors  of  elient  programs.  They  give  formal  operational 
definitions  of  all  eommon  progress  properties  and  prove  that  for  linearizable  objeets,  eaeh  progress 
property  is  equivalent  to  a  speeifie  type  of  eontextual  refinement  that  preserves  termination.  The 
equivalenee  ensures  that  verifieation  of  sueh  a  eontextual  refinement  for  a  eoneurrent  objeet  guar¬ 
antees  both  linearizability  and  the  eorresponding  progress  property.  Contextual  refinement  also 
enables  them  to  verify  safety  and  liveness  properties  of  elient  programs  at  a  high  abstraetion  level 
by  soundly  replaeing  eonerete  method  implementations  with  abstraet  atomie  operations. 


Quantitative  Reasoning  for  Proving  Lock-Freedom  (LICS’13)  In  this  work,  the  PI  and  his 

team  present  a  novel  quantitative  proof  teehnique  for  the  modular  and  loeal  verifieation  of  look- 
freedom.  In  oontrast  to  proofs  based  on  temporal  rely-guarantee  requirements,  this  new  quantitative 
reasoning  method  ean  be  direotly  integrated  in  modern  program  logios  that  are  designed  for  the 
verifieation  of  safety  properties.  Using  a  single  formalism  for  verifying  memory  safety  and  look- 
freedom  allows  a  oombined  eorreotness  proof  that  verifies  both  properties  simultaneously.  This 
work  presents  one  possible  formalization  of  this  quantitative  proof  teehnique  by  developing  a  variant 
of  eoneurrent  separation  logie  (CSL)  for  total  eorreotness.  To  enable  quantitative  reasoning,  CSL 
is  extended  with  a  predioate  for  affine  tokens  to  aooount  for,  and  provide  an  upper  bound  on  the 
number  of  loop  iterations  in  a  program.  Loek-freedom  is  then  reduoed  to  total-oorreetness  proofs. 
Quantitative  reasoning  is  demonstrated  in  detail,  both  informally  and  formally,  by  verifying  the 
lookfreedom  of  Treiber’s  non-blooking  staok.  Furthermore,  it  is  shown  how  the  teehnique  is  used 
to  verify  the  loek-freedom  of  more  advaneed  shared-memory  data  struetures  that  use  elimination 
baekoff  sehemes  and  hazard-pointers. 


Compositional  Verification  of  a  Baby  Virtual  Memory  Manager  (CPP’12)  A  virtual  memory 
manager  (VMM)  is  a  part  of  an  operating  system  that  provides  the  rest  of  the  kernel  with  an  abstraet 
model  of  memory.  Although  small  in  size,  it  involves  eomplieated  and  interdependent  invariants 
that  make  monolithie  verifieation  of  the  VMM  and  the  kernel  running  on  top  of  it  diffleult.  In  this 
work,  the  PI  and  his  team  make  the  observation  that  a  VMM  is  eonstrueted  in  layers:  physieal  page 
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allocation,  page  table  drivers,  address  spaee  API,  ete.,  eaeh  layer  providing  an  abstraetion  that  the 
next  layer  utilizes.  They  use  this  layering  to  simplify  the  verifieation  of  individual  modules  of  VMM 
and  then  to  link  them  together  by  eomposing  a  series  of  small  refinements.  The  eompositional 
verifieation  also  supports  funetion  ealls  from  less  abstraet  layers  into  more  abstraet  ones,  allowing 
us  to  simplify  the  verifieation  of  initialization  funetions  as  well.  To  faeilitate  sueh  eompositional 
verifieation,  they  develop  a  framework  that  assists  in  ereation  of  verifieation  systems  for  eaeh  layer 
and  refinements  between  the  layers.  Using  this  framework,  they  have  produeed  a  eertifieation  of 
Baby  VMM,  a  small  VMM  designed  for  simplified  hardware.  The  same  proof  also  shows  that  a 
eertified  kernel  using  Baby  VMM’s  virtual  memory  abstraetion  ean  be  refined  following  a  similar 
sequenee  of  refinements,  and  ean  then  be  safely  linked  with  BabyVMM.  Both  the  verifieation 
framework  and  the  entire  eertifieation  of  BabyVMM  have  been  meehanized  in  the  Coq  Proof 
Assistant. 


A  Case  for  Behavior- Preserving  Actions  in  Separation  Logic  (APLAS’12)  Separation  Logie 
is  a  widely-used  tool  that  allows  for  loeal  reasoning  about  imperative  programs  with  pointers. 
A  straightforward  definition  of  this  ’’loeal  reasoning”  is  that,  whenever  a  program  runs  safely 
on  some  state,  any  additional  state  has  no  effeet  on  the  program’s  behavior.  In  the  presenee  of 
nondeterminism,  however,  loeal  reasoning  must  be  defined  as  something  more  subtle;  speeifieally, 
additional  state  is  allowed  to  deerease  the  amount  of  nondeterminism  of  the  program.  This  subtlety 
eauses  diffieulty  in  proving  various  metatheoretieal  faets  about  Separation  Logie  and  its  variants. 
Four  speeifie  examples  are:  (1)  speeifying  the  behavior  of  a  program  on  its  minimal  footprint  does 
not  provide  a  eomplete  speeifieation;  (2)  data  refinement  requires  a  rather  unintuitive  restrietion 
that  the  memory  used  by  an  abstraet  module  be  a  subset  of  the  memory  used  by  a  eonerete  module 
refining  the  abstraet  one;  (3)  Relational  Separation  Logie  requires  quite  a  bit  of  additional  work 
to  prove  the  frame  rule  sound;  and  (4)  it  is  quite  trieky  to  define  a  model  of  Separation  Logie  in 
whieh  the  total  domain  of  memory  is  finite.  In  this  work,  the  PI  and  his  student  show  how  to  eleanly 
resolve  all  of  these  issues  by  strengthening  the  definition  of  loeal  reasoning  to  eliminate  the  subtlety. 
They  eontend  that  this  solution  will  also  similarly  resolve  future  metatheoretieal  issues. 


Modular  Verification  of  Concurrent  Thread  Management  (APLAS’12)  Thread  management 
is  an  essential  funetionality  in  OS  kernels.  However,  verifieation  of  thread  management  remains  a 
ehallenge,  due  to  two  eonflieting  requirements:  on  the  one  hand,  a  thread  manager — operating  below 
the  thread  abstraetion  layer-should  hide  its  implementation  details  and  be  verified  independently 
from  the  threads  being  managed;  on  the  other  hand,  the  thread  management  eode  in  many  real- 
world  systems  is  eoneurrent,  whieh  might  be  exeeuted  by  the  threads  being  managed,  so  it  seems 
inappropriate  to  abstraet  threads  away  in  the  verifieation  of  thread  managers.  Previous  approaehes 
on  kernel  verifieation  view  thread  managers  as  sequential  eode,  thus  eannot  be  applied  to  thread 
management  in  realistie  kernels.  In  this  work,  the  PI  and  his  team  propose  a  novel  two-layer 
framework  to  verify  eoneurrent  thread  management.  They  ehoose  a  lower  abstraetion  level  than 
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the  previous  approaehes,  where  they  abstraet  away  the  eontext  switeh  routine  only,  and  allow 
the  rest  of  the  thread  management  eode  to  run  eoneurrently  in  the  upper  level.  They  also  treat 
thread  management  data  as  abstraet  resourees  so  that  threads  in  the  environment  ean  be  speeified  in 
assertions  and  be  reasoned  about  in  a  proof  system  similar  to  eoneurrent  separation  logie. 


VeriML:  A  Dependently-Typed,  User-Extensible,  and  Language- Centric  Approach  to  Proof 
Assistant  Software  eertifieation  is  a  promising  approaeh  to  produeing  programs  whieh  are  virtually 
free  of  bugs.  It  requires  the  eonstruetion  of  a  formal  proof  whieh  establishes  that  the  eode  in  question 
will  behave  aeeording  to  its  speeifioation  —  a  higher-level  deseription  of  its  funetionality.  The 
eonstruetion  of  sueh  formal  proofs  is  earried  out  in  tools  ealled  proof  assistants.  Advanees  in  the 
eurrent  state-of-the-art  proof  assistants  have  enabled  the  eertifieation  of  a  number  of  eomplex  and 
realistie  systems  software. 

Despite  sueh  sueeess  stories,  large-seale  proof  development  is  an  areane  art  that  requires 
signifieant  manual  effort  and  is  extremely  time-eonsuming.  The  widely  aeeepted  best  praetiee  for 
limiting  this  effort  is  to  develop  domain- speeifie  automation  proeedures  to  handle  all  but  the  most 
essential  steps  of  proofs.  Yet  this  praetiee  is  rarely  followed  or  needs  eomparable  development  effort 
as  well.  This  is  due  to  a  profound  arehiteetural  shorteoming  of  existing  proof  assistants:  developing 
automation  proeedures  is  eurrently  overly  eomplieated  and  error-prone.  It  involves  the  use  of  an 
amalgam  of  extension  languages,  eaeh  with  a  different  programming  model  and  a  set  of  limitations, 
and  with  signifieant  interfaeing  problems  between  them. 

This  thesis  by  Antonis  Stampoulis  (supervised  by  the  PI)  posits  that  this  situation  ean  be 
signifieantly  improved  by  designing  a  proof  assistant  with  extensibility  as  the  eentral  foeus.  Towards 
that  effeet,  Stampoulis  and  the  PI  have  designed  a  novel  programming  language  ealled  VeriML, 
whieh  eombines  the  benefits  of  the  different  extension  languages  used  in  eurrent  proof  assistants 
while  esehewing  their  limitations.  The  key  insight  of  the  VeriML  design  is  to  eombine  a  rieh 
programming  model  with  a  rieh  type  system,  whieh  retains  at  the  level  of  types  information 
about  the  proofs  manipulated  inside  automation  proeedures.  The  effort  required  for  writing  new 
automation  proeedures  is  signifieantly  redueed  by  leveraging  this  typing  information  aeeordingly. 

They  show  that  generalizations  of  the  traditional  features  of  proof  assistants  are  a  direet  eonse- 
quenee  of  the  VeriML  design.  Therefore  the  language  itself  ean  be  seen  as  the  proof  assistant  in 
its  entirety  and  also  as  the  single  language  the  user  has  to  master.  Also,  they  show  how  traditional 
automation  meehanisms  offered  by  eurrent  proof  assistants  ean  be  programmed  direetly  within  the 
same  language;  users  are  thus  free  to  extend  them  with  domain- speeifie  sophistieation  of  arbitrary 
eomplexity.  In  the  dissertation  they  present  all  aspeets  of  the  VeriML  language:  the  formal  defini¬ 
tion  of  the  language;  an  extensive  study  of  its  metatheoretie  properties;  the  details  of  a  eomplete 
prototype  implementation;  and  a  number  of  examples  implemented  and  tested  in  the  language. 
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Static  and  User-Extensible  Proof  Checking  (POPL’12)  Despite  reeent  sueeesses,  large-seale 
proof  development  within  proof  assistants  remains  an  areane  art  that  is  extremely  time-eonsuming. 
The  PI  and  his  team  argue  that  this  ean  be  attributed  to  two  profound  shorteomings  in  the  arehiteeture 
of  modem  proof  assistants.  The  first  is  that  proofs  need  to  inelude  a  large  amount  of  minute  detail; 
this  is  due  to  the  rigidity  of  the  proof  eheeking  proeess,  whieh  eannot  be  extended  with  domain- 
speeifie  knowledge.  In  order  to  avoid  these  details,  they  rely  on  developing  and  using  taeties, 
speeialized  proeedures  that  produee  proofs.  Unfortunately,  taeties  are  both  hard  to  write  and  hard  to 
use,  revealing  the  seeond  shorteoming  of  modern  proof  assistants.  This  is  beeause  there  is  no  statio 
knowledge  about  their  expeeted  use  and  behavior.  As  has  reeently  been  demonstrated,  languages 
that  allow  type-safe  manipulation  of  proofs,  like  Beluga,  Delphin  and  VeriML,  ean  be  used  to  partly 
mitigate  this  seeond  issue,  by  assigning  rieh  types  to  taeties.  Still,  the  arehiteetural  issues  remain. 
In  this  work,  the  PI  and  his  team  build  on  this  existing  work,  and  demonstrate  two  novel  ideas:  an 
extensible  eonversion  rule  and  support  for  statio  proof  soripts.  Together,  these  ideas  enable  us  to 
support  both  user-extensible  proof  eheeking,  and  sophistioated  statio  eheeking  of  taeties,  leading 
to  a  new  point  in  the  design  spaoe  of  future  proof  assistants.  Both  ideas  are  based  on  the  interplay 
between  a  light-weight  staging  oonstruot  and  the  rieh  type  information  available. 
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5  CONCLUSIONS 


Operating  System  (OS)  kernels  form  the  bedrock  of  all  system  software — they  can  have  the  greatest 
impact  on  the  resilience,  extensibility,  and  security  of  today’s  computing  hosts.  A  single  kernel  bug 
can  easily  wreck  the  entire  system’s  integrity  and  protection.  During  the  last  four  years,  the  PI  and 
his  team  at  Yale  have  successfully  designed  and  implemented  a  clean-slate  CertiKOS  hypervisor 
kernel  that  runs  on  Intel  and  AMD  multicore  platforms  and  supports  Linux  and  ROS  applications 
on  Landshark  UGVs  with  good  performance.  They  have  also  developed  new  certified  programming 
methodologies  and  tools  that  support  programming  and  composing  certified  abstraction  layers  (in 
C  and  assembly)  and  verify  contextual  safety,  correctness,  liveness,  and  security  properties  in  one 
unified  setting.  They  developed  a  fully  specificied  and  verified  single-core  mCertiKOS  kernel  in 
Coq  that  is  highly  compositional  with  formally  specified  layers  and  strong  contextual  correctness 
guarantees.  They  also  developed  new  semantics  and  logics  for  reasoning  about  declarative  and 
decentralized  information  flow  control  with  declassification,  new  certified  resource  anlaysis  tools, 
and  new  logics  for  verifying  safety  and  liveness  properties  of  fine-grained  concurrent  programs. 
Finally,  they  developed  new  proof  automation  support  including  the  design  and  implementation  of 
the  VeriML  language  and  new  Coq  Ltac  libraries. 

Traditional  OS  kernels  use  a  hardware-enforced  “red  line’’  to  isolate  the  behaviors  of  user 
programs  and  to  protect  the  integrity  of  the  kernel  code.  The  PI  and  his  team’s  new  layered  approach 
to  certified  kernels  replaces  the  red  line  with  a  large  number  of  abstraction  layers  enforced  via 
formal  specification  and  proofs.  They  believe  this  will  open  up  a  whole  new  dimension  of  research 
efforts  toward  building  truly  reliable,  secure,  and  extensible  system  software. 
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Abstract 

Despite  recent  successes,  large-scale  proof  development  within 
proof  assistants  remains  an  arcane  art  that  is  extremely  time- 
consuming.  We  argue  that  this  can  be  attributed  to  two  profound 
shortcomings  in  the  architecture  of  modern  proof  assistants.  The 
first  is  that  proofs  need  to  include  a  large  amount  of  minute  detail; 
this  is  due  to  the  rigidity  of  the  proof  checking  process,  which  can¬ 
not  be  extended  with  domain-specific  knowledge.  In  order  to  avoid 
these  details,  we  rely  on  developing  and  using  tactics,  specialized 
procedures  that  produce  proofs.  Unfortunately,  tactics  are  both  hard 
to  write  and  hard  to  use,  revealing  the  second  shortcoming  of  mod¬ 
ern  proof  assistants.  This  is  because  there  is  no  static  knowledge 
about  their  expected  use  and  behavior. 

As  has  recently  been  demonstrated,  languages  that  allow  type- 
safe  manipulation  of  proofs,  like  Beluga,  Delphin  and  VeriML, 
can  be  used  to  partly  mitigate  this  second  issue,  by  assigning  rich 
types  to  tactics.  Still,  the  architectural  issues  remain.  In  this  paper, 
we  build  on  this  existing  work,  and  demonstrate  two  novel  ideas: 
an  extensible  conversion  rule  and  support  for  static  proof  scripts. 
Together,  these  ideas  enable  us  to  support  both  user-extensible 
proof  checking,  and  sophisticated  static  checking  of  tactics,  leading 
to  a  new  point  in  the  design  space  of  future  proof  assistants.  Both 
ideas  are  based  on  the  interplay  between  a  light-weight  staging 
construct  and  the  rich  type  information  available. 

Categories  and  Subject  Descriptors  D.3.1  [Programming  Lan¬ 
guages]:  Formal  Definitions  and  Theory 

General  Terms  Languages,  Verification 

1.  Introduction 

There  have  been  various  recent  successes  in  using  proof  assistants 
to  construct  foundational  proofs  of  large  software,  like  a  C  com¬ 
piler  [Leroy  2009]  and  an  OS  microkernel  [Klein  et  al.  2009],  as 
well  as  complicated  mathematical  proofs  [Gonthier  2008].  Despite 
this  success,  the  process  of  large-scale  proof  development  using 
the  foundational  approach  remains  a  complicated  endeavor  that  re¬ 
quires  significant  manual  effort  and  is  plagued  by  various  architec¬ 
tural  issues. 

The  big  benefit  of  using  a  foundational  proof  assistant  is  that 
the  proofs  involved  can  be  checked  for  validity  using  a  very  small 
proof  checking  procedure.  The  downside  is  that  these  proofs  are 
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very  large,  since  proof  checking  is  fixed.  There  is  no  way  to  add 
domain-specific  knowledge  to  the  proof  checker,  which  would  en¬ 
able  proofs  that  spell  out  less  details.  There  is  good  reason  for  this, 
too:  if  we  allowed  arbitrary  extensions  of  the  proof  checker,  we 
could  very  easily  permit  it  to  accept  invalid  proofs. 

Because  of  this  lack  of  extensibility  in  the  proof  checker,  users 
rely  on  tactics:  procedures  that  produce  proofs.  Users  are  free  to 
write  their  own  tactics,  that  can  create  domain-specific  proofs.  In 
fact,  developing  domain-specific  tactics  is  considered  to  be  good 
engineering  when  doing  large  developments,  leading  to  signifi¬ 
cantly  decreased  overall  effort  -  as  shown,  e.g.  in  Chlipala  [2011]. 
Still,  using  and  developing  tactics  is  error-prone.  Tactics  are  essen¬ 
tially  untyped  functions  that  manipulate  logical  terms,  and  thus  tac¬ 
tic  programming  is  untyped.  This  means  that  common  errors,  like 
passing  the  wrong  argument,  or  expecting  the  wrong  result,  are  not 
caught  statically.  Exacerbating  this,  proofs  contained  within  tactics 
are  not  checked  statically,  when  the  tactic  is  defined.  Therefore, 
even  if  the  tactic  is  used  correctly,  it  could  contain  serious  bugs  that 
manifest  only  under  some  conditions. 

With  the  recent  advent  of  programming  languages  that  sup¬ 
port  strongly  typed  manipulation  of  logical  terms,  such  as  Beluga 
[Pientka  and  Dunfield  2008],  Delphin  [Poswolsky  and  Schiirmann 
2008]  and  VeriML  [Stampoulis  and  Shao  2010],  this  situation  can 
be  somewhat  mitigated.  It  has  been  shown  in  Stampoulis  and  Shao 
[2010]  that  we  can  specify  what  kinds  of  arguments  a  tactic  expects 
and  what  kind  of  proof  it  produces,  leading  to  a  type-safe  program¬ 
ming  style.  Still,  this  does  not  address  the  fundamental  problem  of 
proof  checking  being  fixed  -  users  still  have  to  rely  on  using  tac¬ 
tics.  Furthermore,  the  proofs  contained  within  the  type-safe  tactics 
are  in  fact  proof-producing  programs,  which  need  to  be  evaluated 
upon  invocation  of  the  tactic.  Therefore  proofs  within  tactics  are 
not  checked  statically,  and  they  can  still  cause  the  tactics  to  fail 
upon  invocation. 

In  this  paper,  we  build  on  the  past  work  on  these  languages, 
aiming  to  solve  both  of  these  issues  regarding  the  architecture  of 
modern  proof  assistants.  We  introduce  two  novel  ideas:  support 
for  an  extensible  conversion  rule  and  static  proof  scripts  inside 
tactics.  The  former  technique  enables  proof  checking  to  become 
user-extensible,  while  maintaining  the  guarantee  that  only  logically 
sound  proofs  are  admitted.  The  latter  technique  allows  for  statically 
checking  the  proofs  contained  within  tactics,  leading  to  increased 
guarantees  about  their  runtime  behavior.  Both  techniques  are  based 
on  the  same  mechanism,  which  consists  of  a  light-weight  staging 
construct.  There  is  also  a  deep  synergy  between  them,  allowing  us 
to  use  the  one  to  the  benefit  of  the  other. 

Our  main  contributions  are  the  following: 

•  First,  we  present  what  we  believe  is  the  first  technique  for  hav¬ 
ing  an  extensible  conversion  rule,  which  combines  the  follow¬ 
ing  characteristics:  it  is  safe,  meaning  that  it  preserves  logical 
soundness;  it  is  user-extensible,  using  a  familiar,  generic  pro- 
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gramming  model;  and,  it  does  not  require  metatheoretic  addi¬ 
tions  to  the  logic,  but  can  be  used  to  simplify  the  logic  instead. 

•  Second,  building  on  existing  work  for  typed  tactic  development, 
we  introduce  static  checking  of  the  proof  scripts  contained 
within  tactics.  This  significantly  reduces  the  development  effort 
required,  allowing  us  to  write  tactics  that  benefit  from  existing 
tactics  and  from  the  rich  type  information  available. 

•  Third,  we  show  how  typed  proof  scripts  can  be  seen  as  an 
alternative  form  of  proof  witness,  which  falls  between  a  proof 
object  and  a  proof  script.  Receivers  of  the  certificate  are  able  to 
decide  on  the  tradeoff  between  the  level  of  trust  they  show  and 
the  amount  of  resources  needed  to  check  its  validity. 

In  terms  of  technical  contributions,  we  present  a  number  of  tech¬ 
nical  advances  in  the  metatheory  of  the  aforementioned  program¬ 
ming  languages.  These  include  a  simple  staging  construct  that  is 
crucial  to  our  development  and  a  new  technique  for  variable  rep¬ 
resentation.  We  also  show  a  condition  under  which  static  checking 
of  proof  scripts  inside  tactics  is  possible.  Last,  we  have  extended 
an  existing  prototype  implementation  with  a  significant  number  of 
features,  enabling  it  to  support  our  claims,  while  also  rendering  its 
use  as  a  proof  assistant  more  practical. 

2.  Informal  presentation 

Glossary  of  terms.  We  will  start  off  by  introducing  some  con¬ 
cepts  that  will  be  used  throughout  the  paper.  The  first  fundamental 
concept  we  will  consider  is  the  notion  of  a  proof  object,  given  a 
derivation  of  a  proposition  inside  a  formal  logic,  a  proof  object  is  a 
term  representation  of  this  derivation.  A  proof  checker  is  a  program 
that  can  decide  whether  a  given  proof  object  is  a  valid  derivation 


of  a  specific  proposition  or  not.  Proof  objects  are  extremely  ver¬ 
bose  and  are  thus  hard  to  write  by  hand.  For  this  reason,  we  use 
tactics:  functions  that  produce  proof  objects.  By  combining  tactics 
together,  we  create  proof-producing  programs,  which  we  call  proof 
scripts.  If  a  proof  script  is  evaluated,  and  the  evaluation  completes 
successfully,  the  resulting  proof  object  can  be  checked  using  the 
original  proof  checker.  In  this  way,  the  trusted  base  of  the  system 
is  kept  at  the  absolute  minimum.  The  language  environment  where 
proof  scripts  and  tactics  are  written  and  evaluated  is  called  a  proof 
assistant,  evidently,  it  needs  to  include  a  proof  checker. 

Checking  proof  objects.  In  order  to  keep  the  size  of  proof  objects 
manageable,  many  of  the  logics  used  for  mechanized  proof  check¬ 
ing  include  a  conversion  rule.  This  rule  is  used  implicitly  by  the 
proof  checker  to  decide  whether  any  two  propositions  are  equiv¬ 
alent;  if  it  determines  that  they  are  indeed  so,  the  proof  of  their 
equivalence  can  be  omitted.  We  can  thus  think  of  it  as  a  special  tac¬ 
tic  that  is  embedded  within  the  proof  checker,  and  used  implicitly. 

The  more  sophisticated  the  relation  supported  by  the  conversion 
rule  is,  the  simpler  are  proof  objects  to  write,  since  more  details  can 
be  omitted.  On  the  other  hand,  the  proof  checker  becomes  more 
complicated,  as  does  the  metatheory  proof  showing  the  soundness 
of  the  associated  logic.  The  choice  in  Coq  [Barras  et  al.  2010], 
one  of  the  most  widely  used  proof  assistants,  with  respect  to  this 
trade-off,  is  to  have  a  conversion  rule  that  identifies  propositions 
up  to  evaluation.  Nevertheless,  extended  notions  of  conversion  are 
desirable,  leading  to  proposals  like  CoqMT  [Strub  2010],  where 
equivalence  up  to  first-order  theories  is  supported.  In  both  cases, 
the  conversion  rule  is  fixed,  and  extending  it  requires  significant 
amounts  of  work.  It  is  thus  not  possible  for  users  to  extend  it  using 
their  own,  domain-specific  tactics,  and  proof  objects  are  thus  bound 
to  get  large.  This  is  why  we  have  to  resort  to  writing  proof  scripts. 

Checking  proof  scripts.  As  mentioned  earlier,  in  order  to  validate 
a  proof  script  we  need  to  evaluate  it  (see  Fig.  la);  this  is  the 
modus  operandi  in  proof  assistants  of  the  HOL  family  [Harrison 
1996;  Slind  and  Norrish  2008].  Therefore,  it  is  easy  to  extend  the 
checking  procedure  for  proof  scripts  by  writing  a  new  tactic,  and 
calling  it  as  part  of  a  script.  The  price  that  this  comes  to  is  that  there 
is  no  way  to  have  any  sort  of  static  guarantee  about  the  validity 
of  the  script,  as  proof  scripts  are  completely  untyped.  This  can  be 
somewhat  mitigated  in  Coq  by  utilizing  the  static  checking  that  it 
already  supports:  the  proof  checker,  and  especially,  the  conversion 
rule  it  contains  (see  Fig.  lb).  We  can  employ  proof  objects  in 
our  scripts;  this  is  especially  useful  when  the  proof  objects  are 
trivial  to  write  but  trigger  complex  conversion  checks.  This  is  the 
essential  idea  behind  techniques  like  proof-by-reflection  [Boutin 
1997],  which  lead  to  more  robust  proof  scripts. 

In  previous  work  [Stampoulis  and  Shao  2010]  we  introduced 
VeriML,  a  language  that  enables  programming  tactics  and  proof 
scripts  in  a  typeful  manner  using  a  general-purpose,  side-effectful 
programming  model.  Combining  typed  tactics  leads  to  typed  proof 
scripts.  These  are  still  programs  producing  proof  objects,  but  the 
proposition  they  prove  is  carried  within  their  type.  Information 
about  the  current  proof  state  (the  set  of  hypotheses  and  goals)  is  also 
available  statically  at  every  intermediate  point  of  the  proof  script.  In 
this  way,  the  static  assurances  about  proof  scripts  are  significantly 
increased  and  many  potential  sources  of  type  errors  are  removed. 
On  the  other  hand,  the  proof  objects  contained  within  the  scripts 
are  still  checked  using  a  fixed  proof  checker;  this  ultimately  means 
that  the  set  of  possible  static  guarantees  is  still  fixed. 

Extensible  conversion  rule.  In  this  paper,  we  build  on  our  earlier 
work  on  VeriML.  In  order  to  further  increase  the  amount  of  static 
checking  of  proof  scripts  that  is  possible  within  this  language,  we 
propose  the  notion  of  an  extensible  conversion  rule  (see  Fig.  Ic).  It 
enables  users  to  write  their  own  domain-specific  conversion  checks 
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t  ::=  proof  object  constructors  |  propositions 

I  natural  numbers,  lists,  etc.  \  sorts  and  types  \X./<5 
$  ::=  *  I  X  :  f  T  ■.:=  [4>]« 

'P  ::=  •  I 'P,  JV' :  r  o  ::=  •  |  o,  x  i-^  f 

main  judgement:  'P;  <5  h  t  :  t'  (type  of  a  logical  term) 


dynamic 


Figure  2.  Staging  in  VeriML 


that  get  included  in  the  conversion  rule.  This  leads  to  simpler  proof 
scripts,  as  more  parts  of  the  proof  can  be  inferred  by  the  conversion 
rule  and  can  therefore  be  omitted.  Also,  it  leads  to  increased  static 
guarantees  for  proof  scripts,  since  the  conversion  checks  happen 
before  the  rest  of  the  proof  script  is  evaluated. 

The  way  we  achieve  this  is  by  programming  the  conversion 
checks  as  type-safe  tactics  within  VeriML,  and  then  evaluating 
them  statically  using  a  simple  staging  mechanism  (see  Fig.  2).  The 
type  of  the  conversion  tactics  requires  that  they  produce  a  proof  ob¬ 
ject  which  proves  the  claimed  equivalence  of  the  propositions.  In 
this  way,  type  safety  of  VeriML  guarantees  that  soundness  is  main¬ 
tained.  At  the  same  time,  users  are  free  to  extend  the  conversion 
rule  with  their  own  conversion  tactics  written  in  a  familiar  program¬ 
ming  model,  without  requiring  any  metatheoretic  additions  or  ter¬ 
mination  proofs.  Such  proofs  are  only  necessary  if  decidability  of 
the  extra  conversion  checks  is  desired.  Furthermore,  this  approach 
allows  for  metatheoretic  reductions  as  the  original  conversion  rule 
can  be  programmed  within  the  language.  Thus  it  can  be  removed 
from  the  logic,  and  replaced  by  the  simpler  notion  of  explicit  equal¬ 
ities,  leading  to  both  simpler  metatheory  and  a  smaller  trusted  base. 

Checking  tactics.  The  above  approach  addresses  the  issue  of 
being  able  to  extend  the  amount  of  static  checking  possible  for 
proof  scripts.  But  what  about  tactics?  Our  existing  work  on  VeriML 
shows  how  the  increased  type  information  addresses  some  of  the 
issues  of  tactic  development  using  current  proof  assistants,  where 
tactics  are  programmed  in  a  completely  untyped  manner. 

Still,  if  we  consider  the  case  of  tactics  more  closely,  we  will 
see  that  there  is  a  limitation  to  the  amount  of  checking  that  is 
done  statically,  even  using  this  language.  When  programming  a 
new  tactic,  we  would  like  to  reuse  existing  tactics  to  produce  the 
required  proofs.  Therefore,  rather  than  writing  proof  objects  by 
hand  inside  the  code  of  a  tactic,  we  would  rather  use  proof  scripts. 
The  issue  is  that  in  order  to  check  whether  the  contained  proof 
scripts  are  valid,  they  need  to  be  evaluated  -  but  this  only  happens 
when  an  invocation  of  the  tactic  reaches  the  point  where  the  proof 
script  is  used.  Therefore,  the  static  guarantees  that  this  approach 
provides  are  severely  limited  by  the  fact  that  the  proof  scripts  inside 
the  tactics  cannot  be  checked  statically,  when  the  tactic  is  defined. 

Static  proof  scripts.  This  is  the  second  fundamental  issue  we  ad¬ 
dress  in  this  paper.  We  show  that  the  same  staging  construct  uti¬ 
lized  for  introducing  the  extensible  conversion  rule,  can  be  lever¬ 
aged  to  perform  static  proof  checking  for  tactics.  The  crucial  point 
of  our  approach  is  the  proof  of  existence  of  a  transformation  be¬ 
tween  proof  objects,  which  suggests  that  under  reasonable  condi¬ 
tions,  a  proof  script  contained  within  a  tactic  can  be  transformed 
into  a  static  proof  script.  This  static  script  can  then  be  evaluated  at 
tactic  definition  time,  to  be  checked  for  validity. 

Last,  we  will  show  that  this  approach  lends  itself  well  to  writing 
extensions  of  the  conversion  rule.  We  show  that  we  can  create  a  lay¬ 
ering  of  conversion  rules:  using  a  basic  conversion  rule  as  a  starting 
point,  we  can  utilize  it  inside  static  proof  scripts  to  implicitly  prove 
the  required  obligations  of  a  more  advanced  version,  and  so  on. 
This  minimizes  the  required  user  effort  for  writing  new  conversion 
rules,  and  enables  truly  modular  proof  checking. 


Figure  3.  Assumptions  about  the  logic  language 

3.  Our  toolbox 

In  this  section,  we  will  present  the  essential  ingredients  that  are 
needed  for  the  rest  of  our  development.  The  main  requirement  is 
a  language  that  supports  type-safe  manipulation  of  terms  of  a  par¬ 
ticular  logic,  as  well  as  a  general-purpose  programming  model  that 
includes  general  recursion  and  other  side-effectful  operations.  Two 
recently  proposed  languages  for  manipulating  LF  terms.  Beluga 
[Pientka  and  Dunfield  2008]  and  Delphin  [Poswolsky  and  Schiir- 
mann  2008],  fit  this  requirement,  as  does  VeriML  [Stampoulis  and 
Shao  2010],  which  is  a  language  used  to  write  type-safe  tactics.  Our 
discussion  is  focused  on  the  latter,  as  it  supports  a  richer  ML-style 
calculus  compared  to  the  others,  something  useful  for  our  purposes. 
Still,  our  results  apply  to  all  three. 

We  will  now  briefly  describe  the  constructs  that  these  languages 
support,  as  well  as  some  new  extensions  that  we  propose.  The 
interested  reader  can  read  more  about  these  constructs  in  Sec.  6 
and  in  our  technical  report  [Stampoulis  and  Shao  2012]. 

A  formal  logic.  The  computational  language  we  are  presenting  is 
centered  around  manipulation  of  terms  of  a  specific  formal  logic. 
We  will  see  more  details  about  this  logic  in  Sec.  4.  For  the  time 
being,  it  will  suffice  to  present  a  set  of  assumptions  about  the  syn¬ 
tactic  classes  and  typing  judgements  of  this  logic,  shown  in  Fig.  3. 
Logical  terms  are  represented  by  the  syntactic  class  t,  and  include 
proof  objects,  propositions,  terms  corresponding  to  the  domain  of 
discourse  (e.g.  natural  numbers),  and  the  needed  sorts  and  type  con¬ 
structors  to  classify  such  terms.  Their  variables  are  assigned  types 
through  an  ordered  context  <I>.  A  package  of  a  logical  term  t  to¬ 
gether  with  the  variables  context  it  inhabits  <I>  is  called  a  contex¬ 
tual  term  and  denoted  as  T  =  [<F]  t.  Our  computational  language 
works  over  contextual  terms  for  reasons  that  will  be  evident  later. 
The  logic  incorporates  such  terms  by  allowing  them  to  get  substi¬ 
tuted  for  meta-variables  X,  using  the  constructor  X /a.  When  a  term 
T  =  [<F']  t  gets  substituted  for  X,  we  go  from  the  <!>'  context  to  the 
current  context  <I>  using  the  substitution  a. 

Logical  terms  are  classified  using  other  logical  terms,  based  on 
the  normal  variables  environment  <I>,  and  also  an  environment  *F 
that  types  meta-variables,  thus  leading  to  the  T*;  <I>  h  f  :  r'  judge¬ 
ment.  For  example,  a  term  t  representing  a  closed  proposition  will 
be  typed  as  •  h  r  :  Prop,  while  a  proof  object  t^f  proving  that 
proposition  will  satisfy  the  judgement  •  h  fpf :  t. 

ML-style  functional  programming.  We  move  on  to  the  compu¬ 
tational  language.  As  its  main  core,  we  assume  an  ML-style  func¬ 
tional  language,  supporting  general  recursion,  algebraic  data  types 
and  mutable  references  (see  Fig.  4).  Terms  of  this  fragment  are 
typed  under  a  computational  variables  environment  F  and  a  store 
typing  environment  £,  mapping  mutable  locations  to  types.  Typing 
judgements  are  entirely  standard,  leading  to  a  L;  The:!  judge¬ 
ment  for  typing  expressions. 

Dependently-typed  programming  over  logical  terms.  As  shown 
in  Fig.  5,  the  first  important  additions  to  the  ML  computational  core 
are  constructs  for  dependent  functions  and  products  over  contextual 
terms  T.  Abstraction  over  contextual  terms  is  denoted  as  XX  :  T.e.  It 
has  the  dependent  function  type  (A  :  T)  — >  x.  The  type  is  dependent 
since  the  introduced  logical  term  might  be  used  as  the  type  of 
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k  ::=  *\ki 

X  :;=  unit  |  int  |  bool  |  Xi  X2  |  Xi  +X2  |  Xi  x  X2  |  :  k.% 

I  Va  :  /:.x  I  a  I  array  x  |  A.a :  fc.x  |  Xi  12  |  •  •  • 
e  ::=  0  I  n  I  ei  +  £2  I  ^1  <  ^2  I  true  |  false  |  if  e  then  elsee2 
I  XiL-.x.e  I  ei  £2  I  (ei,  ^2)  I  proj,- e  |  inj,- e 
I  case(e,  xj.ei,  X2.e2)  I  fold  e  \  unfold  e  \  Aa  :k.e\ex 
I  fix  X  :  x.e  I  mkarray (e,  e')  \  e[e']  \  e[e']  :=  e"  \  I  \  error 
r  ::=  •  I  r,  X  :  X  I  r,  a  :  A:  Z  ::=  •  |  Z,  / :  array  x 


Figure  4.  Syntax  for  the  computational  language  (ML  fragment) 

X  ::=•••  I  (X  :  r)  X  I  (X  :  r)  X  X  I  ((|) :  ctx)  x 
e  ::=•••  \  XX  :  T.e  \  e  T  \  :  ctx.e  |  e  <L  |  (L,  e) 

I  let  {X,  x)  =  e  in  e' 

I  holcase  T  return  x  of  (Ti  ei)  •  •  •  {T„  i->  e„) 

I  ctxcase  4>  return  x  of  (<I>i  ei )  •  •  •  (<f>„  e„) 


Figure  5.  Syntax  for  the  computational  language  (logical  term 
constructs) 


another  term.  An  example  would  be  a  function  that  receives  a 
proposition  plus  a  proof  object  for  that  proposition,  with  type: 
(P  :  Prop)  — >  (2f  :  F)  — >  X.  Dependent  products  that  package  a 
contextual  logical  term  with  an  expression  are  introduced  through 
the  {T,  e)  construct  and  eliminated  using  let  {X,  x)  =  e  in  e'\  their 
type  is  denoted  as  (X  :  F)  x  x.  Especially  for  packages  of  proof 
objects  with  the  unit  type,  we  introduce  the  syntax  LTjF). 

Last,  in  order  to  be  able  to  support  functions  that  work  over 
terms  in  any  context,  we  introduce  context  polymorphism,  through 
a  similarly  dependent  function  type  over  contexts.  With  these  in 
mind,  we  can  define  a  simple  tactic  that  gets  a  packaged  proof 
of  a  universally  quantified  formula,  and  an  instantiation  term,  and 
returns  a  proof  of  the  instantiated  formula  as  follows: 

Instantiate  :  ((|) :  ctx,  T  :  [(|)]Type,  P  x-.T]  Prop,  a  :  [(|)]  T)  — > 
LT([()]Vx:r,F)^LT(HF/[ld<^,  a]) 

Instantiate  is/T  P  ap\  =  let  {H)  =  pf  In  {H  a) 

From  here  on,  we  will  omit  details  about  contexts  and  substitutions 
in  the  interest  of  presentation. 

Pattern  matching  over  terms.  The  most  important  new  construct 
that  VeriML  supports  is  a  pattern  matching  construct  over  logical 
terms  denoted  as  holcase.  This  construct  is  used  for  dependent 
matching  of  a  logical  term  against  a  set  of  patterns.  The  return 
clause  specifies  its  return  type;  we  omit  it  when  it  is  easy  to  infer. 
Patterns  are  normal  terms  that  include  unification  variables,  which 
can  be  present  under  binders.  This  is  the  essential  reason  why 
contextual  terms  are  needed. 

Pattern  matching  over  environments.  For  the  purposes  of  our  de¬ 
velopment,  it  is  very  useful  to  support  one  more  pattern  matching 
construct:  matching  over  logical  variable  contexts.  When  trying  to 
construct  a  certain  proof,  the  logical  environment  represents  what 
the  current  proof  context  is:  what  the  current  logical  hypotheses  at 
hand  are,  what  types  of  terms  have  been  quantified  over,  etc.  By  be¬ 
ing  able  to  pattern  match  over  the  environment,  we  can  “look  up” 
things  in  our  current  set  of  hypotheses,  in  order  to  prove  further 
propositions.  We  can  thus  view  the  current  environment  as  repre¬ 
senting  a  simple  form  of  the  current  proof  state;  the  pattern  match¬ 
ing  construct  enables  us  to  manipulate  it  in  a  type-safe  manner. 


One  example  is  an  “assumption”  tactic,  that  tries  to  prove  a 
proposition  by  searching  for  a  matching  hypotheses  in  the  context: 

assumption  :  ((|) :  ctx,F  :  Prop)  ^  option  LT(F) 
assumption  (|)  F  = 
ctxcase  cj)  of 

if' ,  H  :P^  return  {H) 

I  (|)',  _  assumption  ()' F 

Proof  object  erasure  semantics  (new  feature).  The  only  con¬ 
struct  that  can  influence  the  evaluation  of  a  program  based  on  the 
structure  of  a  logical  term  is  the  pattern  matching  construct.  For 
our  purposes,  pattern  matching  on  proof  objects  is  not  necessary  - 
we  never  look  into  the  structure  of  a  completed  proof.  Thus  we  can 
have  the  typing  rules  of  the  pattern  matching  construct  specifically 
disallow  matching  on  proof  objects. 

In  that  case,  we  can  define  an  alternate  operational  semantics  for 
our  language  where  all  proof  objects  are  erased  before  using  the 
original  small-step  reduction  rules.  Because  of  type  safety,  these 
proof-erasure  semantics  are  guaranteed  to  yield  equivalent  results: 
even  if  no  proof  objects  are  generated,  they  are  still  bound  to  exist. 

Implicit  arguments.  Let  us  consider  again  the  instantiate  func¬ 
tion  defined  earlier.  This  function  expects  five  arguments.  From  its 
type  alone,  it  is  evident  that  only  the  last  two  arguments  are  strictly 
necessary.  The  last  argument,  corresponding  to  a  proof  expression 
for  the  proposition  Vv  :  T,  F,  can  be  used  to  reconstruct  exactly  the 
arguments  (]),  T  and  F.  Furthermore,  if  we  know  what  the  result¬ 
ing  type  of  a  call  to  the  function  needs  to  be,  we  can  choose  even 
the  instantiation  argument  a  appropriately.  We  employ  a  simple  in- 
ferrence  mechanism  so  that  such  arguments  are  omitted  from  our 
programs.  This  feature  is  also  crucial  in  our  development  in  order 
to  implicitly  maintain  and  utilize  the  current  proof  state  within  our 
proof  scripts. 

Minimal  staging  support  (new  feature).  Using  the  language  we 
have  seen  so  far  we  are  able  to  write  powerful  tactics  using  a 
general-purpose  programming  model.  But  what  if,  inside  our  pro¬ 
grams,  we  have  calls  to  tactics  where  all  of  their  arguments  are 
constant?  Presumably,  those  tactic  calls  could  be  evaluated  to  proof 
objects  prior  to  tactic  invocation.  We  could  think  of  this  as  a  form 
of  generalized  constant  folding,  which  has  one  intriguing  benefit: 
we  can  tell  statically  whether  the  tactic  calls  succeed  or  not. 

This  paper  is  exactly  about  exploring  this  possibility.  Towards 
this  effect,  we  introduce  a  rudimentary  staging  construct  in  our 
computational  language.  This  takes  the  form  of  a  letstatic  construct, 
which  binds  a  static  expression  to  a  variable.  The  static  expression 
is  evaluated  during  stage  one  (see  Fig.  2),  and  can  only  depend  on 
other  static  expressions.  Details  of  this  construct  are  presented  in 
Fig.  lid  and  also  in  Sec.  6.  After  this  addition,  expressions  in  our 
language  have  a  three-phase  lifetime,  that  are  also  shown  in  Fig.  2. 

—  type-checking,  where  the  well-formedness  of  expressions  ac¬ 
cording  to  the  rules  of  the  language  is  checked,  and  inference 
of  implicit  arguments  is  performed 

—  static  evaluation,  where  expressions  inside  letstatic  are  reduced 
to  values,  yielding  a  residual  expression 

—  run-time,  where  the  residual  expression  is  evaluated 

4.  Extensible  conversion  rule 

With  these  tools  at  hand,  let  us  now  return  to  the  first  issue  that 
motivates  us:  the  fact  that  proof  checking  is  rigid  and  cannot  be 
extended  with  user-defined  procedures.  As  we  have  said  in  our  in¬ 
troduction,  many  modern  proof  assistants  are  based  on  logics  that 
include  a  conversion  rule.  This  rule  essentially  identifies  proposi- 
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(sorts) 

(kinds) 

(props.) 


'P;  <5  K  d  :  X 
'P;  4>  he  refi  d  :  d  =  d 


i-  ::=  Type  |  Type' 

X  ::=  Prop  |  Nat  |  Xi  — )■  X2 
p  ::=  Pi^P2\'ix:  X.P  \  x  \  True 
I  False  I  Pi  AP2  I  ••• 
(dom.obj.)  d  ::=  Zero  |  Succ  d  |  P  |  •  •  • 

(proof  objects)  Jt  ::=  xl'kx:  P.n  |  iti  712  |  ^  :  X.Jt 

I  ltd  I  ••• 

(HOL  terms)  t  ::=  i  |  X  |  P  |  d  |  Jt 


Selected  rules: 


^ Intro 

*P;  <^,x  :  Ph  Jt :  P' 

*P;  <5  h  Xx  :  P.Jt :  P  ^  P' 


^  Elim 

'P;  I-  Jt :  P  ^  P' 
'P;  <P  h  Jt' :  P 
'P;  4>  h  Jt  Jt'  :  P' 


Figure  6.  Syntax  and  selected  rules  of  the  logic  language  XHOL 
'P;  h,  Jt :  P  P  =pN  P' 

'P;  rp  he  :  P' 

(Ax  :  X.d)  d'  — d[d' /x] 
natElimjc  d^  ds  zero  d^ 
natElimj^;  d^  ds  (succ  d)  — >|3n  ds  d  (natElimg(;  d^  ds  d) 

is  the  compatible,  reflexive,  symmetric  and  transitive 
closure  of  d  — d' 


d  — >pN  d' 


Conversion 


Figure  7.  Extending  AHOL  with  the  conversion  rule  (AHOLe) 


tions  up  to  some  equivalence  relation:  usually  this  is  equivalence  up 
to  partial  evaluation  of  the  functions  contained  within  propositions. 

The  supported  relation  is  decided  when  the  logic  is  designed. 
Any  extension  to  this  relation  requires  a  significant  amount  of  work, 
both  in  terms  of  implementation,  and  in  terms  of  metatheoretic 
proof  required.  This  is  evidenced  by  projects  that  extend  the  con¬ 
version  rule  in  Coq,  such  as  Blanqui  et  al.  [1999]  and  Strub  [2010]. 
Even  if  user  extensions  are  supported,  those  only  take  the  form  of 
first-order  theories.  Can  we  do  better  than  this,  enabling  arbitrarily 
complex  user  extensions,  written  with  the  full  power  of  ML,  yet 
maintaining  soundness? 

It  turns  out  that  we  can:  this  is  the  subject  of  this  section.  The 
key  idea  is  to  recognize  that  the  conversion  rule  is  essentially  a 
tactic,  embedded  within  the  type  checker  of  the  logic.  Calls  to 
this  tactic  are  made  implicitly  as  part  of  checking  a  given  proof 
object  for  validity.  So  how  can  we  support  a  flexible,  extensible 
alternative?  Instead  of  hardcoding  a  conversion  tactic  within  the 
logic  type  checker,  we  can  program  a  type-safe  version  of  the  same 
tactic  within  VeriML,  with  the  requirement  that  it  provides  proof  of 
the  claimed  equivalence.  Instead  of  calling  the  conversion  tactic  as 
part  of  proof  checking,  we  use  staging  to  call  the  tactic  statically 
-  after  (VeriML)  type  checking,  but  before  runtime  execution. 
This  can  be  viewed  as  a  second,  potentially  non-terminating  proof 
checking  stage.  Users  are  now  free  to  write  their  own  conversion 
tactics,  extending  the  static  checking  available  for  proof  objects  and 
proof  scripts.  Still,  soundness  is  maintained,  since  full  proof  objects 
in  the  original  logic  can  always  be  constructed.  As  an  example, 
we  have  extended  the  conversion  rule  that  we  use  by  a  congruence 
closure  procedure,  which  makes  use  of  mutable  data  structures,  and 
by  an  arithmetic  simplification  procedure. 

4.1  Introducing:  the  conversion  rule 

First,  let  us  present  what  the  conversion  rule  really  is  in  more  detail. 
We  will  base  our  discussion  on  a  simple  type-theoretic  higher-order 


T';  d>  he  :  X  'P;  <I>  K  ^2  : 

'P;  <I>  he  =  ^2  :  Ptop 

T*;  <I>,  X  :  X  he  P  :  Prop  T*;  <1>  he  :  X 
*P;  d>heJl:P[rfi/x]  *P;  4>  he  Jt' :  =  r/2 

T*;  rp  he  leibniz  (Ax  :  X.P)  Jt  Jt' :  P[d2/x\ 

“P;  <I>,  X  :  X  he  Jt :  di  =  (?2 

T*;  4>  he  lamEq  (Ax  :  X.jt)  :  (Ax  :  X.dy)  =  (Ax  :  X.d2) 

T*;  <P,  X  :  X  he  Jt :  =  d2  T*;  $  he  di  :  Prop 

*P;  $  he  forallEq  (Ax  :  X.Jt)  :  (Vx  :  X.di)  =  (Vx  :  X.d2) 

T*;  <P,  X  :  X  he  d  :  X'  T*;  <P  he  d' :  X 
•P;  <P  he  betaEq  (Ax  :  X.d)  d’:{hc:  X.d)  d'  =  d[d' /x] 
Axioms  assumed: 

natElimBasegc  :  V/^.V/j.natElimjc /j /s  zero  = /^ 
natElimStepjc  :  \/ fs-'in.  natElim^ /z /i  (succ  n)  = 

fs  n  (natElimjc  /z  fs  n) 


Figure  8.  Extending  AHOL  with  explicit  equality  (AHOLe) 


logic,  based  on  the  AHOL  logic  as  described  in  Barendregt  and 
Geuvers  [1999],  and  used  in  our  original  work  on  VeriML  [Stam- 
poulis  and  Shao  2010].  We  can  think  of  such  a  logic  composed  by 
the  following  broad  classes:  the  objects  of  the  domain  of  discourse 
d,  which  are  the  objects  that  the  logic  reasons  about,  such  as  natural 
numbers  and  lists;  their  classifiers,  the  kinds  X  (classified  in  turn 
by  sorts  i);  the  propositions  P;  and  the  derivations,  which  prove 
that  a  certain  proposition  is  true.  We  can  represent  derivations  in 
a  linear  form  as  terms  Jt  in  a  typed  lambda-calculus;  we  call  such 
terms  proof  objects,  and  their  types  represent  propositions  in  the 
logic.  Checking  whether  a  derivation  is  a  valid  proof  of  a  certain 
proposition  amounts  to  type-checking  its  corresponding  proof  ob¬ 
ject.  Some  details  of  this  logic  are  presented  in  Fig.  6;  the  interested 
reader  can  find  more  information  about  it  in  the  above  references 
and  in  our  technical  report  [Stampoulis  and  Shao  2012]. 

In  Fig.  6,  we  show  what  the  conversion  rule  looks  like  for  this 
logic:  it  is  a  typing  judgement  that  effectively  identifies  proposi¬ 
tions  up  to  an  equivalence  relation,  with  respect  to  checking  proof 
objects.  We  call  this  version  of  the  logic  AHOL^  and  use  to 
denote  its  entailment  relation.  The  equivalence  relation  we  con¬ 
sider  in  the  conversion  rule  is  evaluation  up  to  p-reductions  and 
uses  of  primitive  recursion  of  natural  numbers,  denoted  as  natElim. 
In  this  way,  trivial  arguments  based  on  this  notion  of  computa¬ 
tion  alone  need  not  be  witnessed,  as  for  example  is  the  fact  that 
(Succx)  +y  =  Succ  (x-Fy)  -  when  the  addition  function  is  defined 
by  primitive  recursion  on  the  first  argument.  Of  course,  this  is  only 
a  very  basic  use  of  the  conversion  rule.  It  is  possible  to  omit  larger 
proofs  through  much  more  sophisticated  uses.  This  leads  to  simpler 
proofs  and  smaller  proof  objects. 

Still,  when  using  this  approach,  the  choice  of  what  relation  is 
supported  by  the  conversion  rule  needs  to  be  made  during  the  defi¬ 
nition  of  the  logic.  This  choice  permeates  all  aspects  of  the  metathe¬ 
ory  of  the  logic.  It  is  easy  to  see  why,  even  with  the  tiny  fragment 
of  logic  we  have  introduced.  Most  typing  rules  for  proof  objects  in 
the  logic  are  similar  to  the  rules  — l-lNTRO  and  — >Elim:  they  are 
syntax-directed.  This  means  that  upon  seeing  the  associated  proof 
object  constructor,  like  Ax  :  P.n  in  the  case  of  -->lNTRO,  we  can  di¬ 
rectly  tell  that  it  applies.  If  all  rules  were  syntax  directed,  it  would 
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PNequal :  ((|) :  ctx,  T  :  Type,«i  :T,t2'.T)^  option  LT(fi  =12) 
PNequal  t\t2  = 
holcase  whnf  t\,  whnf  t2  0\ 

{{la  '.T'  T)  tf,),  {tc  Id) 

do  (pfi)  ^  PNequal  (|)  (r' r)  fa  fc 

(pfi)  ^  pNequal  (])  r' f/,  frf 
return  (•  •  •  proof  of  tatb  =  tctd  ■■■) 

I  {la  th),{tc  — ^  Id) 

do  (pfi)  •(—  pNequal  (|)  Prop  fa  fc 

(pfi)  ^  PNequal  (|)  Prop  ffe  fj 

return  (•  •  •  proof  of  ta  ^  th  =tc id  ■  ■  ■) 

I  (Xx  :  r.fi),  {Xx  :  T.t2) 

do  (pf)  PNequal  [(|),  X  :  r]  Prop  fi  f2 
return  (•  •  •  proof  of  Xx  :  T.t\  =  Xx  :  T.t2  ■■  ■) 
n>  do  return  (•  •  •  proof  of  ii=t\  ■■■) 

I  fi,f2  i-->  None 

requireEqual :  ((|):ctx,  T  :  Type,fi  :  r,f2  :  7’).LT(fi  =  f2) 
requireEqual  t\t2  = 

match  PNequal  t\t2  with  Somex  i-!>  x  |  None  1-^  error 


Figure  9.  VeriML  tactic  for  checking  equality  up  to  P-conversion 


be  entirely  simple  to  prove  that  the  logic  is  sound  by  an  inductive 
argument:  essentially,  since  no  proof  constructor  for  Faise  exists, 
there  is  no  valid  derivation  for  Faise. 

In  this  logic,  the  only  rule  that  is  not  syntax  directed  is  exactly 
the  conversion  rule.  Therefore,  in  order  to  prove  the  soundness  of 
the  logic,  we  have  to  show  that  the  conversion  rule  does  not  some¬ 
how  introduce  a  proof  of  Faise.  This  means  that  proving  the  sound¬ 
ness  of  the  logic  passes  essentially  through  the  specific  relation  we 
have  chosen  for  the  conversion  rule.  Therefore,  this  approach  is 
foundationally  limited  from  supporting  user  extensions,  since  any 
new  extension  would  require  a  new  metatheoretic  result  in  order  to 
make  sure  that  it  does  not  violate  logical  soundness. 

4.2  Throwing  conversion  away 

Since  having  a  fixed  conversion  rule  is  bound  to  fail  if  we  want 
it  to  be  extensible,  what  choice  are  we  left  with,  but  to  throw  it 
away?  This  radical  sounding  approach  is  what  we  will  do  here.  We 
can  replace  the  conversion  rule  by  an  explicit  notion  of  equality, 
and  provide  explicit  proof  witnesses  for  rewriting  based  on  that 
equality.  Essentially,  all  the  points  where  the  conversion  rule  was 
alluded  to  and  proofs  were  omitted,  need  now  be  replaced  by  proof 
objects  witnessing  the  equivalence.  Some  details  for  the  additions 
required  to  the  base  XHOL  logic  are  shown  in  Fig.  8,  yielding  the 
XHOLj;  logic.  There  are  good  reasons  for  choosing  this  version: 
first,  the  proof  checker  is  as  simple  as  possible,  and  does  not  need 
to  include  the  conversion  checking  routine.  We  could  view  this 
routine  as  performing  proof  search  over  the  replacement  rules, 
so  it  necessarily  is  more  complicated,  especially  since  it  needs 
to  be  relatively  efficient.  Also,  the  metatheory  of  the  logic  itself 
can  be  simplified.  Even  when  the  conversion  rule  is  supported,  the 
metatheory  for  the  associated  logic  is  proved  through  the  explicit 
equality  approach;  this  is  because  model  construction  for  a  logic 
benefits  from  using  explicit  equality  [Siles  and  Herbelin  2010]. 

Still,  this  approach  has  a  big  disadvantage:  the  proof  objects 
soon  become  extremely  large,  since  they  include  painstakingly  de¬ 
tailed  proofs  for  even  the  simplest  of  equivalences.  This  precludes 
their  use  as  independently  checkable  proof  certificates  that  can  be 
sent  to  a  third  party.  It  is  possible  that  this  is  one  of  the  reasons 
why  systems  based  on  logics  with  explicit  equalities,  such  as  HOL4 


whnf :  ((|) :  cXx,T  :  Type,? :  T)  — >  (f'  :  T)  x  LT(t  =  f') 
whnf  t  =  holcase  t  of 
{H-.r  ^T){t2-.T’)^ 
let  (r(,  pf\)  =  whnf  (|)  {T'  T)  tx  In 
holcase  t[  of 

Xx:T'.tf  ([(|)]r//[ld<i,,t2],-") 

\t[  {[(^]t[t2,---) 

I  natEllmac  /,  n 

let  {n',  pfi)  =  whnf  (|)  Nat  n  in  holcase  n'  of 
zero  1-^  (H/z,---) 

Isuccn'  ([(])]/,  m' (natEllmac/j /,«').•••) 

\n'  h->  ([(])]  natEllmac/j/j  «',•••) 

\t  (t,  ■  ■■) 


Figure  10.  VeriML  tactic  for  rewriting  to  weak  head-normal  form 

[Slind  and  Norrish  2008]  and  Isabelle/HOL  [Nipkow  et  al.  2002], 
do  not  generate  proof  objects  by  default. 

4.3  Getting  conversion  back 

We  will  now  see  how  it  is  possible  to  reconcile  the  explicit  equality 
based  approach  with  the  conversion  rule:  we  will  gain  the  conver¬ 
sion  rule  back,  albeit  it  will  remain  completely  outside  the  logic. 
Therefore  we  will  be  free  to  extend  it,  all  the  while  without  risking 
introducing  unsoundness  in  the  logic,  since  the  logic  remains  fixed 
(XHOLg  as  presented  above). 

We  do  this  by  revisiting  the  view  of  the  conversion  rule  as  a 
special  “trusted”  tactic,  through  the  tools  presented  in  the  previous 
section.  First,  instead  of  hardcoding  a  conversion  tactic  in  the  type 
checker,  we  program  a  type-safe  conversion  tactic,  utilizing  the 
features  of  VeriML.  Based  on  typing  alone  we  require  that  it  returns 
a  valid  proof  of  the  claimed  equivalences: 

pNequal :  ((|) :  ctx,  T  :  Type,  t :  T,  t' :  T)  ^  option  LT(t  =  t') 

Second,  we  evaluate  this  tactic  under  proof  erasure  semantics.  This 
means  that  no  proof  objects  are  produced,  leading  to  the  same  space 
gains  as  the  original  conversion  rule.  Third,  we  use  the  staging 
construct  in  order  to  check  conversion  statically. 

Details.  We  now  present  our  approach  in  more  detail.  First,  in 
Fig.  9,  we  show  a  sketch  of  the  code  behind  the  type-safe  conver¬ 
sion  check  tactic.  It  works  by  first  rewriting  its  input  terms  into 
weak  head-normal  form,  via  the  whnf  function  in  Fig.  10,  and  then 
recursively  checking  their  subterms  for  equality.  In  the  equivalence 
checking  function,  more  cases  are  needed  to  deal  with  quantifi¬ 
cation;  while  in  the  rewriting  procedure,  a  recursive  call  is  miss¬ 
ing,  which  would  complicate  our  presentation  here.  We  also  de¬ 
fine  a  version  of  the  tactic  that  raises  an  error  instead  of  returning 
an  option  type  if  we  fail  to  prove  the  terms  equal,  which  we  call 
requireEqual.  The  full  details  can  be  found  in  our  implementation. 

The  code  of  the  pNequal  tactic  is  in  fact  entirely  similar  to 
the  code  one  would  write  for  the  conversion  check  routine  inside 
a  logic  type  checker,  save  for  the  extra  types  and  proof  objects.  It 
therefore  follows  trivially  that  everything  that  holds  for  the  standard 
implementation  of  the  conversion  check  also  holds  for  this  code: 
e.g.  it  corresponds  exactly  to  the  =pf^  relation  as  defined  in  the 
logic;  it  is  bound  to  terminate  because  of  the  strong  normalization 
theorem  for  this  relation;  and  its  proof-erased  version  is  at  least  as 
trustworthy  as  the  standard  implementation. 

Furthermore,  given  this  code,  we  can  produce  a  form  of  typed 
proof  scripts  inside  VeriML  that  correspond  exactly  to  proof  objects 
in  the  logic  with  the  conversion  rule,  both  in  terms  of  their  actual 
code,  and  in  terms  of  the  steps  required  to  validate  them.  This  is 
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done  by  constructing  a  proof  script  in  VeriML  by  induction  on 
the  derivation  of  the  proof  object  in  XHOLc,  replacing  each  proof 
object  constructor  by  an  equivalent  VeriML  tactic  as  follows: 


constructor 

to  tactic 

of  type 

Xx :  P.n 

Assume  e 

LT([(|),  H  :  P\  P')  -^  LT(F  ^  P') 
LT{P  P')  LT(P)  ^  LT(F') 
LT([(|),  V  :  T]P')  LT(Vv  :  T,P') 

Iti  Il2 

Apply  ei  62 

he :  3C.li 

Intro  e 

It  d 

Inst  e  a 

LT(Vv:  r,F)  ^  (a  :T)^ 

LT(F/[ld,  a]) 

c 

Lift  c 

(H:P)^  LT(F) 

(conversion) 

Conversion 

LT(F)  ^  LT(F  =  P')  LT(F') 

Here  we  have  omitted  the  current  logical  environment  (|);  it  is 
maintained  through  syntactic  means  as  discussed  in  Sec.  7  and 
through  type  inference.  The  only  subtle  case  is  conversion.  Given 
the  transformed  proof  e  for  the  proof  object  it  contained  within  a 
use  of  the  conversion  rule,  we  call  the  conversion  tactic  as  follows: 

letstatic  pf  =  requireEqual  P  P'  in  Conversion  e  pf 

The  arguments  to  requireEquai  can  be  easily  inferred,  making  cru¬ 
cial  use  of  the  rich  type  information  available.  Conversion  could 
also  be  used  implicitly  in  the  other  tactics.  Thus  the  resulting  ex¬ 
pression  looks  entirely  identical  to  the  original  proof  object. 

Correspondence  with  original  proof  object.  In  order  to  elucidate 
the  correspondence  between  the  resulting  proof  script  expression 
and  the  original  proof  object,  it  is  fruitful  to  view  the  proof  script 
as  a  proof  certificate,  sent  to  a  third  party.  The  steps  required  to 
check  whether  it  constitutes  a  valid  proof  are  the  following.  First, 
the  whole  expression  is  checked  using  the  type  checker  of  the  com¬ 
putational  language.  Then,  the  calls  to  the  requireEqual  function  are 
evaluated  during  stage  one,  using  proof  erasure  semantics.  We  ex¬ 
pect  them  to  be  successful,  just  as  we  would  expect  the  conversion 
rule  to  be  applicable  when  it  is  used.  Last,  the  rest  of  the  tactics 
are  evaluated;  by  a  simple  argument,  based  on  the  fact  that  they  do 
not  use  pattern  matching  or  side-effects,  they  are  guaranteed  to  ter¬ 
minate  and  produce  a  proof  object  in  ^HOL^.  This  validity  check 
is  entirely  equivalent  to  the  behavior  of  type-checking  the  A.HOLe 
proof  object,  save  for  pushing  all  conversion  checks  towards  the 
end. 

4.4  Extending  conversion  at  will 

In  our  treatment  of  the  conversion  rule  we  have  so  far  focused 
on  regaining  the  |3N  conversion  in  our  framework.  Still,  there  is 
nothing  confining  us  to  supporting  this  conversion  check  only.  As 
long  as  we  can  program  a  conversion  tactic  in  VeriML  that  has  the 
right  type,  it  can  safely  be  made  part  of  our  conversion  rule. 

For  example,  we  have  written  an  eulEqual  function,  which 
checks  terms  for  equivalence  based  on  the  equality  with  uninter¬ 
preted  functions  decision  procedure.  It  is  adapted  from  our  previous 
work  on  VeriML  [Stampoulis  and  Shao  2010].  This  equivalence 
checking  tactic  isolates  hypotheses  of  the  form  d\  =  d2  from  the 
current  context,  using  the  newly-introduced  context  matching  sup¬ 
port.  Then,  it  constructs  a  union-find  data  structure  in  order  to  form 
equivalence  classes  of  terms.  Based  on  this  structure,  and  using 
code  similar  to  pNequal  (recursive  calls  on  subterms),  we  can  de¬ 
cide  whether  two  terms  are  equal  up  to  simple  uses  of  the  equality 
hypotheses  at  hand.  We  have  combined  this  tactic  with  the  original 
PNequal  tactic,  making  the  implicit  equivalence  supported  similar 
to  the  one  in  the  Calculus  of  Congruent  Constructions  [Blanqui 
et  al.  2005].  This  demonstrates  the  flexibility  of  this  approach: 
equivalence  checking  is  extended  with  a  sophisticated  decision 
procedure,  which  is  programmed  using  its  original,  imperative  for¬ 
mulation.  We  have  programmed  both  the  rewriting  procedure  and 
the  equality  checking  procedure  in  an  extensible  manner,  so  that 
we  can  globally  register  further  extensions. 


4.5  Typed  proof  scripts  as  certificates 

Earlier  we  discussed  how  we  can  validate  the  proof  scripts  resulting 
from  turning  the  conversion  rule  into  explicit  tactic  calls.  This 
discussion  shows  an  interesting  aspect  of  typed  proof  scripts:  they 
can  be  viewed  as  a  proof  witness  that  is  a  flexible  compromise 
between  untyped  proof  scripts  and  proof  objects.  When  a  typed 
proof  script  consists  only  of  static  calls  to  conversion  tactics  and 
uses  of  total  tactics,  it  can  be  thought  of  as  a  proof  object  in  a 
logic  with  the  corresponding  conversion  rule.  When  it  also  contains 
other  tactics,  that  perform  potentially  expensive  proof  search,  it 
corresponds  more  closely  to  an  untyped  proof  script,  since  it  needs 
to  be  fully  evaluated.  Still,  we  are  allowed  to  validate  parts  of 
it  statically.  This  is  especially  useful  when  developing  the  proof 
script,  because  we  can  avoid  the  evaluation  of  expensive  tactic  calls 
while  we  focus  on  getting  the  skeleton  of  the  proof  correct. 

Using  proof  erasure  for  evaluating  requireEqual  is  only  one 
of  the  choices  the  receiver  of  such  a  proof  certificate  can  make. 
Another  choice  would  be  to  have  the  function  return  an  actual  proof 
object,  which  we  can  check  using  the  XHOLg  type  checker.  In  that 
case,  the  VeriML  interpreter  does  not  need  to  become  part  of  the 
trusted  base  of  the  system.  Last,  the  ‘safest  possible’  choice  would 
be  to  avoid  doing  any  evaluation  of  the  function,  and  ask  the  proof 
certificate  provider  to  do  the  evaluation  of  requireEqual  themselves. 
In  that  case,  no  evaluation  of  computational  code  would  need  to 
happen  at  the  proof  certificate  receiver’s  side.  This  mitigates  any 
concerns  one  might  have  for  code  execution  as  part  of  proof  validity 
checking,  and  guarantees  that  the  small  A.HOL(;  type  checker  is  the 
trusted  base  in  its  entirety.  Also,  the  receiver  can  decide  on  the 
above  choices  selectively  for  different  conversion  tactics  -  e.g.  use 
proof  erasure  for  pNequal  but  not  for  eulEqual,  leading  to  a  trusted 
base  identical  to  the  ^HOLe  case.  This  means  that  the  choice  of 
the  conversion  rule  rests  with  the  proof  certificate  receiver  and  not 
with  the  designer  of  the  logic.  Thus  the  proof  certificate  receiver 
can  choose  the  level  of  trust  they  require  at  will. 

5.  Static  proof  scripts 

In  the  previous  section,  we  have  demonstrated  how  proof  checking 
for  typed  proof  scripts  can  be  made  user-extensible,  through  a 
new  treatment  of  the  conversion  rule.  It  makes  use  of  user-defined, 
type-safe  tactics,  which  are  evaluated  statically.  The  question  that 
remains  is  what  happens  with  respect  to  proofs  within  tactics.  If 
a  proof  script  is  found  within  a  tactic,  must  we  wait  until  that 
evaluation  point  is  reached  to  know  whether  the  proof  script  is 
correct  or  not?  Or  is  there  a  way  to  check  this  statically,  as  soon 
as  the  tactic  is  defined? 

In  this  section  we  show  how  this  is  possible  to  do  in  VeriML 
using  the  staging  construct  we  have  introduced.  Still,  in  this  case 
matters  are  not  as  simple  as  evaluating  certain  expressions  statically 
rather  than  dynamically.  The  reason  is  that  proof  scripts  contained 
within  tactics  mention  uninstantiated  meta-variables,  and  thus  can¬ 
not  be  evaluated  through  staging.  We  resolve  this  by  showing  the 
existence  of  a  transformation,  which  “collapses”  logical  terms  from 
an  arbitrary  meta- variables  context  into  the  empty  one. 

We  will  focus  on  the  case  of  developing  conversion  routines, 
similar  to  the  ones  we  saw  earlier.  The  ideas  we  present  are  gen¬ 
erally  applicable  when  writing  other  types  of  tactics  as  well;  we 
focus  on  conversion  routines  in  order  to  demonstrate  that  the  two 
main  ideas  we  present  in  this  paper  can  work  in  tandem. 

A  rewriter  for  plus.  We  will  consider  the  case  of  writing  a 
rewriter  -similar  to  whnf-  for  simplifying  expressions  of  the  form 
x  +  y,  depending  on  the  second  argument.  The  addition  function  is 
defined  by  induction  on  the  first  argument,  as  follows: 

(+)  =  A-c.A,y.natEllmNat  y  (jkp.Xr.Succ  r)  x 
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In  order  for  rewriters  to  be  able  to  use  existing  as  well  as  future 
rewriters  to  perform  their  recursive  calls,  we  write  them  in  the 
open  recursion  style  -  they  receive  a  function  of  the  same  type  that 
corresponds  to  the  “current”  rewriter.  The  code  looks  as  follows: 

rewriterType  =  ((|) :  ctx,  T  :  Type,r :  T)  — >  {t' :  T)  x  LT(r  =  t') 
plusRewriterl  :  rewriterType  — >  rewriterType 
piusRewriter!  recursive  t  =  hoicase  t  with 

x+y  i-> 

iet  (y',  (pfy'))  =  recursive  (|)  y  in 
iet  (t\  (pft'))  = 

hoicase  y'  return  Lt' :  [(|)]  Nat.LT([(|)]x+y'  =  r')  of 
0  ^  {x^- ■■  proof  of  x  +  Q  =  X  ■■  ■) 

I  Succy'  i->  ^Succ(x  +y), 

•  •  •  proof  of  X  +  Succ  y  =  Succ  (x  +  y')  ■■  ■'j 
\y  I-+  (x  +  y',  -  •  •  proof  of  x  +  y'  =  x  +  y'  •  •  • ) 
in(f',  {■■■  proof  of  x  +  y  =  f  •••)) 

\l  H-  (r,  •  •  •  proof  of  t  =  t  ■  ■■) 

While  developing  such  a  tactic,  we  can  leverage  the  VeriML 
type  checker  to  know  the  types  of  missing  proofs.  But  how  do  we 
fill  them  in?  For  the  interesting  cases  of  x  +  0  =  x  and  x  +  Succ  y'  = 
Succ  (x  +  y'),  we  would  certainly  need  to  prove  the  corresponding 
lemmas.  But  for  the  rest  of  the  cases,  the  corresponding  lemmas 
would  be  uninteresting  and  tedious  to  state,  such  as  the  following 
for  the  X  +  y  =  r'  case: 

iemmal  :  Vx,y,y',f',y  =  y'  +■  (x  +  y'  =  r')  +-x+y  =  r 

Stating  and  proving  such  lemmas  soon  becomes  a  hindrance  when 
writing  tactics.  An  alternative  is  to  use  the  congruence  closure 
conversion  rule  to  solve  this  trivial  obligation  for  us  directly  at  the 
point  where  it  is  required.  Our  first  attempt  would  be: 

proof  of  X  +  y  =  f'  = 

let  (pf)  =  requireEqual  [(|),/fi  :y  =  y',/f2  :x+y' =  f']  (x+y)  f' 

in  ([(|)ipf/[id(^,  pfy',  pff]) 

The  benefit  of  this  approach  is  evident  when  utilizing  implicit  argu¬ 
ments,  since  most  of  the  details  can  be  inferred  and  therefore  omit¬ 
ted.  Here  we  had  to  alter  the  environment  passed  to  requireEqual, 
which  includes  several  extra  hypotheses.  Once  the  resulting  proof 
has  been  computed,  the  hypotheses  are  substituted  by  the  actual 
proofs  that  we  have. 

The  problem  with  this  approach  is  two-fold:  first,  the  call  to  the 
requireEqual  tactic  is  recomputed  every  time  we  reach  that  point  of 
our  function.  For  such  a  simple  tactic  call,  this  does  not  impact  the 
runtime  significantly;  still,  if  we  could  avoid  it,  we  would  be  able 
use  more  sophisticated  and  expensive  tactics.  The  second  problem 
is  that  if  for  some  reason  the  requireEqual  is  not  able  to  prove  what 
it  is  supposed  to,  we  will  not  know  until  we  actually  reach  that  point 
in  the  function. 

Moving  to  static  proofs.  This  is  where  using  the  letstatic  construct 
becomes  essential.  We  can  evaluate  the  call  to  requireEqual  stat¬ 
ically,  during  stage  one  interpretation.  Thus  we  will  know  at  the 
time  that  plusRewriterl  is  defined  whether  the  call  succeeded;  also, 
it  will  be  replaced  by  a  concrete  value,  so  it  will  not  affect  the  run¬ 
time  behavior  of  each  invocation  of  plusRewriterl  anymore.  To  do 
that,  we  need  to  avoid  mentioning  any  of  the  metavariables  that  are 
bound  during  runtime,  like  x,  y,  and  f'.  This  is  done  by  specifying 
an  appropriate  environment  in  the  call  to  requireEqual,  similarly  to 
the  way  we  incorporated  the  extra  knowledge  above  and  substituted 


it  later.  Using  this  approach,  we  have: 

proof  of  x  +  y  =  t'  = 
letstatic  (pf)  = 

let  (|)'  =  [x,y,y',r' :  Nat, Hi  :  y  =  y' ,H2  :  x  +  y'  =  t'\  in 
requireEqual  (|)'  (x  +  y)  t' 

in  ([(|)]pf/[x/id,|„y/id(^,y'/id(^,f'/id4„pfy'/id(^,pft'/id(^]) 

What  we  are  essentially  doing  here  is  replacing  the  meta¬ 
variables  by  normal  logical  variables,  which  our  tactics  can  deal 
with.  The  meta-variable  context  is  “collapsed”  into  a  normal  con¬ 
text;  proofs  are  constructed  using  tactics  in  this  environment;  last, 
the  resulting  proofs  are  transported  back  into  the  desired  context  by 
substituting  meta-variables  for  variables.  We  have  explicitly  stated 
the  substitutions  in  order  to  distinguish  between  normal  logical 
variables  and  meta-variables. 

The  reason  why  this  transformation  needs  to  be  done  is  that 
functions  in  our  computational  language  can  only  manipulate  logi¬ 
cal  terms  that  are  open  with  respect  to  a  normal  variables  context; 
not  logical  terms  that  are  open  with  respect  to  the  meta-variables 
context  too.  A  much  more  complicated,  but  also  more  flexible  al¬ 
ternative  to  using  this  “collapsing”  trick  would  be  to  support  meta- 
n- variables  within  our  computational  language  directly. 

Overall,  this  approach  is  entirely  similar  to  proving  the  auxiliary 
lemma  mentioned  above,  prior  to  the  tactic  definition.  The  benefit 
is  that  by  leveraging  the  type  information  together  with  type  in¬ 
ference,  we  can  avoid  stating  such  lemmas  explicitly,  while  retain¬ 
ing  the  same  runtime  behavior.  We  thus  end  up  with  very  concise 
proof  expressions  that  are  statically  validated.  We  introduce  syn¬ 
tactic  sugar  for  binding  a  static  proof  script  to  a  variable,  and  then 
performing  a  substitution  to  bring  it  into  the  current  context,  since 
this  is  a  common  operation. 

(e) static  =  letstatic  (pf)  =  e  in  ([(])] pf/---) 

Based  on  these,  the  trivial  proofs  in  the  above  tactic  can  be  filled 
in  using  a  simple  (requireEqual)5(jj(;j,  call;  for  the  other  two  we  use 
(Instantiate  (Natinduction  requireEqual  requireEqual)  x)5(3(ij,. 

After  we  define  plusRewriterl,  we  can  register  it  with  the 
global  equivalence  checking  procedure.  Thus,  all  later  calls  to 
requireEqual  will  benefit  from  this  simplification.  It  is  then  sim¬ 
ple  to  prove  commutativity  for  addition: 

plusComm  :  LT(Vx,y.x+y  =  y+x) 

plusComm  =  Natinduction  requireEqual  requireEqual 

Based  on  this  proof,  we  can  write  a  rewriter  that  takes  commu¬ 
tativity  into  account  and  uses  the  hash  values  of  logical  terms  to 
avoid  infinite  loops.  We  have  worked  on  an  arithmetic  simplifica¬ 
tion  rewriter  that  is  built  by  layering  such  rewriters  together,  using 
previous  ones  to  aid  us  in  constructing  the  proofs  required  in  later 
ones.  It  works  by  converting  expressions  into  a  list  of  monomi¬ 
als,  sorting  the  list  based  on  the  hash  values  of  the  variables,  and 
then  factoring  monomials  on  the  same  variable.  Also,  the  eufEqual 
procedure  mentioned  earlier  has  all  of  its  associated  proofs  auto¬ 
mated  through  static  proof  scripts,  using  a  naive,  potentially  non¬ 
terminating,  equality  rewriter. 

Is  collapsing  always  possible?  A  natural  question  to  ask  is 
whether  collapsing  the  metavariables  context  into  a  normal  context 
is  always  possible.  In  order  to  cast  this  as  a  more  formal  ques¬ 
tion,  we  notice  that  the  essential  step  is  replacing  a  proof  object  7t 
of  type  [<I>]  t,  typed  under  the  meta- variables  environment  T*,  by  a 
proof  object  Tt'  of  type  [<F']  t'  typed  under  the  empty  meta- variables 
environment.  There  needs  to  be  a  substitution  so  that  %'  gets  trans¬ 
ported  back  to  the  <I>,  T'  environment,  and  has  the  appropriate  type. 
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Syntax  of  the  logic 


(terms)  t  s  \  c  \  fi  \  bi  \  X{ti).t2  I  ?i  I  n(ri).?2  |  ?i  =  ^2  I  tefl  t  \  leibniz  t\  ?2  |  lamEq  t  \  forallEq  t\  t2  \  betaEq  ?i  ?2 
(sorts)  s  Prop  |  Type  |  Type^  (van  context)  <I>  •!<!>,/  (substitutions)  c  •  |  G,  t 


Example  of  representation:  a  :  Nath  Xx  :  Nat.(X>’ :  Nat.refI  (plus  fly))(plus  ox)  Nat  h  X(Nat).(X(Nat).refl  (plus/o  ^o))  (plus  /o  bo) 


Freshen: 


\fi]  —  fi 

\^n\  m  —  fm 

\biY'  —  when  /  <  n 

mn).t2)r  =  x(M").rt2r+‘ 

\h  *2!  =  r^ii  rf2i 


Bind:  [fj" 


L/m-ij;; 

mi 

[bi\ 

mh)-t2)\ 
Vh  h\ 


(a)  Hybrid  deBruijn  levels-deBruijn  indices  representation  technique 


fi  when  i  <m—\ 
bi+\ 

L?iJ  L'2j 


Syntax 


fl  I  Xi/a  O  •  |  <I>,  t  |  <I>,  4>,  o  ::=  •  |  a,  f  |  o,  id((|),)  (indices)  I  ::=  n  \  1+  |(|),|  (ctx.terms)  T  ::=  [<!>](  | 

(ctx.kinds)  K  ::=  [O]  t  |  [<I>]  ctx  (extension  context)  ¥  ::=  •  |  AT  (ext.  subst.)  Oip  ::=  •  |  O'?,  T 


‘P;  <I>  h  f :  t'  (sample) 


O.I  =  t 
‘P;  fi:t 


'P;  :  n(t).f'  'P;<I>l-t2:f 

'P;  <I>  h  f,  (2  :  \t']  ■  (id<i.,f2) 


'P.!=  [<!>']»'  ‘P;  O  h  o  :  O' 

'P;  O  h  Xi/a  :t'  a 


'PI-7':A: 


‘P;  O  h  t :  f' 
'Pt-  [<!>]( :  [<!>]?' 


‘P  h  O,  O'  wf 
'Ph  [0]0' :  [0]ctx 


'E  h  0  wf  (sample) 


(b)  Extension  variables:  meta-variables  and  context  variables 


'EhOwf  — [0]ctx 

^h(0,  (]),)  wf 


Subst.  application:  t  ■  G 


c(5  —  c 


fl<5  =  O.I 


biG  =  bi 


{X{ti).t2)-a  =  X{ti-a).(t2-a) 


(f,  t2)-o  =  (fl -c)  ((2 -a) 


Ext.  subst.  application  (sample) 


(I,  |(|),-|)  •  Gxp  =  (I- G»{/),  |0'|  when  Gxp.r 
(a,  id((|)i))-a>j/  =  o-a>j/,  ido^,,- 


U'K' 


(Xi/a)  ■  Oxji  =  ( ■  (o  ■  0\p)  when  a^i.i  =  [_]  t 
(O,  (),■)  ■  c.p  =  O  ■  c>j/,  O'  when  Oxp.i  =  [_]0' 


'P;  O  h  a  :  O' 


'P;OI-»:» 


'P;  O  h  a  :  o' 

'P;  Ot-f  :('  o 
‘P;  Oh  (o,  f)  :  (O',  t') 


'P;  Oh  a:  o'  ‘P.!=[0']ctx 

O',  i|)i  C  O 

'P;Oh(c,  id((),))  :  (<J>', 


'Pho^c'P' 

"P  h  Oip  :  ‘P'  I  'P  h  T  :  a:  ■  Oq/ 
(selected)  Oh  (c.p,  T)  :  {'V' ,  K) 


Subst.  lemmas: 


'P;Ohf:t'  0;0'ho:0 
O;  o'  h  t  ■  0  :  f'  ■  o 


O;  O'  h  o  :  O  O;  O"  h  0' :  O' 
O;  O"  h  0  ■  0' :  O 


(c)  Substitutions  over  logical  variables  and  extension  variables 


Ohr  :  a:  O'h0q/  :0 

o'hr-ovp  :a:-0^/ 


Syntax: 


r 


•  I  r,  X  :  T  I  r,  X  T  I  r,  a  : 


«::=■■■  I  letstatic  x  =  e  in  e' 


Limit  ctx: 


•  I  Static 

(r,  x\st)\  static 
(r,  X  :  ?)  I  static 
(r,  Ot  :  A:)  I  static 


r|  static  5  ^  ^ 
r]  static 
r|  static 


'E;  E;  r  h  ^  :  X  (part) 


•  ;  E;  r|static  h  ^  :  X  E;  r,x  :,y  x  h  :  x 

E;  r  h  letstatic  x  —  e  in  /  :  x 


X  X  G  r 

'E;  E;  rhx:x 


V  ::=  A{K).e^  \  pack  T  return  (.x)  with  v  |  ()  |  Xx  :  T.e^i  \  (v,  v')  |  inj,-  v  |  fold  v  |  /  |  Aa  :  k.e^ 

S  letstaticx  =  •  in  |  letstaticx  =  S  in  ^'  |  A(Ar).§  |  Xx  :  x.S  |  unpack  e^  (•)-^-(S)  I  case(^j,  x.S,  x.e2) 

I  case(e(/,  x.Cd^  x.S)  |  Act :  k.S  \  fixx  :  x.S  |  unify  T  return  (.x)  with  (^.T'  ^  S)  |  £,4S] 

Es  £s  T  \  pack  T  return  (.x)  with  £,y  |  unpack  £,^  {.)x.{e')  |  £s  |  Cd  ks  \  (£.y,  e)  \  (cd,  £?)  |  proj/  £,t  |  inj,  £.j 

I  case(£i,x.^i, x.^2)  I  fold  £i  |  unfold  £.v  |  ref  £.v  |  £.?  :=  e'  \  ed  £.?  |  !£,?  |  £.v  x 

ed  all  of  e  except  letstatic  x  =  ^  in  £  ::=  exactly  as  £.y  with  £5  — >  £  and  e  ed 


Stage  1  op.sem.: 


{/J,  eg)  — >  ( ^'  ,  4  ) 

(^,S[.,])-^,(^,',S[4]) 


( /r  ,  S[letstatic  X  =  V  in  e]  )  — ( P  ,  Si^i^/x]]  ) 


( p  ,  ietstatic  X  =  vine)  — ( /j  ,  e[v/x]  ) 

(d)  Computational  language:  staging  support 


Figure  11.  Main  definitions  in  metatheory 
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We  have  proved  that  this  is  possible  under  certain  restrictions: 
the  types  of  the  metavariables  in  the  current  context  need  to  depend 
on  the  same  free  variables  context  <f>max,  or  prefixes  of  that  context. 
Also  the  substitutions  they  are  used  with  need  to  be  prefixes  of 
the  identity  substitution  for  4>max-  Such  terms  are  characterized  as 
collapsible.  We  have  proved  that  collapsible  terms  can  be  replaced 
using  terms  that  do  not  make  use  of  metavariables;  more  details 
can  be  found  in  Sec.  6  and  in  the  accompanying  technical  report 
[Stampoulis  and  Shao  2012]. 

This  restriction  corresponds  very  well  to  the  treatment  of  vari¬ 
able  contexts  in  the  Delphin  language.  This  language  assumes  an 
ambient  context  of  logical  variables,  instead  of  full,  contextual 
modal  terms.  Constructs  to  extend  this  context  and  substitute  a  spe¬ 
cific  variable  exist.  If  this  last  feature  is  not  used,  the  ambient  con¬ 
text  grows  monotonically  and  the  mentioned  restriction  holds  triv¬ 
ially.  In  our  tests,  this  restriction  has  not  turned  out  to  be  limiting. 

6.  Metatheory 

We  have  completed  an  extensive  reworking  of  the  metatheory  of 
VeriML,  in  order  to  incorporate  the  features  that  we  have  presented 
in  this  paper.  Our  new  metatheory  includes  a  number  of  techni¬ 
cal  advances  compared  to  our  earlier  work  [Stampoulis  and  Shao 
2010].  We  will  present  a  technical  overview  of  our  metatheory  in 
this  section;  full  details  can  be  found  in  our  technical  report  [Stam¬ 
poulis  and  Shao  2012]. 

Variable  representation  technique.  Though  our  metatheory  is 
done  on  paper,  we  have  found  that  using  a  concrete  variable  repre¬ 
sentation  technique  elucidates  some  aspects  of  how  different  kinds 
of  substitutions  work  in  our  language,  compared  to  having  nor¬ 
mal  named  variables.  For  example,  instantiating  a  context  variable 
with  a  concrete  context  triggers  a  set  of  potentially  complicated 
a-renamings,  which  a  concrete  representation  makes  explicit.  We 
use  a  hybrid  technique  representing  bound  variables  as  deBruijn  in¬ 
dices,  and  free  variables  as  deBruijn  levels.  Our  technique  is  a  small 
departure  from  the  named  approach,  requiring  fewer  extra  annota¬ 
tions  and  lemmas  than  normal  deBruijn  indices.  Also  it  identifies 
terms  not  only  up  to  a-equivalence,  but  also  up  to  extension  of  the 
context  with  new  variables;  this  is  why  it  is  also  used  within  the  Ver¬ 
iML  implementation.The  two  fundamental  operations  of  this  tech¬ 
nique  are  freshening  and  binding,  which  are  shown  in  Fig.  11a. 

Extension  variables.  We  extend  the  logic  with  support  for  meta¬ 
variables  and  context  variables  -  we  refer  to  both  these  sorts  of 
variables  as  extension  variables.  A  meta-variable  Xj  stands  for  a 
contextual  term  T  =  which  packages  a  term  together  with 
the  context  it  inhabits.  Context  variables  (|);  stand  for  a  context  <1>, 
and  are  used  to  “weaken”  parametric  contexts  in  specific  positions. 
Both  kinds  of  variables  are  needed  to  support  manipulation  of  open 
logical  terms.  Details  of  their  definition  and  typing  are  shown  in 
Fig.  1  lb.  We  use  the  same  hybrid  approach  as  above  for  represent¬ 
ing  these  variables.  A  somewhat  subtle  aspect  of  this  extension  is 
that  we  generalize  the  deBruijn  levels  I  used  to  index  free  variables, 
in  order  to  deal  effectively  with  parametric  contexts. 

Substitutions.  The  hybrid  representation  technique  we  use  for 
variables  renders  simultaneous  substitutions  for  all  variables  in 
scope  as  the  most  natural  choice.  In  Fig.  11c,  we  show  some  ex¬ 
ample  rules  of  how  to  apply  a  full  simultaneous  substitution  a  to  a 
term  t,  denoted  ast-a.  Similarly,  we  define  full  simultaneous  sub¬ 
stitutions  CT>j/  for  extension  contexts;  defining  their  application  has 
a  very  natural  description,  because  of  our  variable  representation 
technique.  We  prove  a  number  of  substitution  lemmas  which  have 
simple  statements,  as  shown  in  Fig.  11c.  The  proofs  of  these  lem¬ 
mas  comprise  the  main  effort  required  in  proving  the  type-safety 
of  a  computational  language  such  as  the  one  we  support,  as  they 


represent  the  point  where  computation  specific  to  logical  term  ma¬ 
nipulation  takes  place. 

Computational  language.  We  define  an  ML-style  computational 
language  that  supports  dependent  functions  and  dependent  pairs 
over  contextual  terms  T,  as  well  as  pattern  matching  over  them. 
Lack  of  space  precludes  us  from  including  details  here;  full  details 
can  be  found  in  the  accompanying  technical  report  [Stampoulis  and 
Shao  2012].  A  fairly  complete  ML  calculus  is  supported,  with  mu¬ 
table  references  and  recursive  types.  Type  safety  is  proved  using 
standard  techniques;  its  central  point  is  extending  the  logic  sub¬ 
stitution  lemmas  to  expressions  and  using  them  to  prove  progress 
and  preservation  of  dependent  functions  and  dependent  pairs.  This 
proof  is  modular  with  respect  to  the  logic  and  other  logics  can  eas¬ 
ily  be  supported. 

Pattern  matching.  Our  metatheory  includes  many  extensions  in 
the  pattern  matching  that  is  supported,  as  well  as  a  new  approach  for 
dealing  with  typing  patterns.  We  include  support  for  pattern  match¬ 
ing  over  contexts  (e.g.  to  pick  out  hypotheses  from  the  context)  and 
for  non-linear  patterns.  The  allowed  patterns  are  checked  through  a 
restriction  of  the  usual  typing  rules  'V\-pT  :  K. 

The  essential  idea  behind  our  approach  to  pattern  matching 
is  to  identify  what  the  relevant  variables  in  a  typing  derivation 
are.  Since  contexts  are  ordered,  “removing”  non-relevant  variables 
amounts  to  replacing  their  definitions  in  the  context  with  holes, 
which  leads  us  to  partial  contexts  T'.  The  corresponding  notion 
of  partial  substitutions  is  denoted  as  65  •  Our  main  theorem  about 
pattern  matching  can  then  be  stated  as: 

Theorem  6.1  (Decidability  of  pattern  matching)  If'PhpT  ■.  K, 
•  \-p  T'  :  K  and  relevant  ‘Vh  T  :  K)  =  'V,  then  either  there 
exists  a  unique  partial  substitution  such  that  •  h  CTvj;  :  *F  and 
T  ■  Ovp  =  T' ,  or  no  such  substitution  exists. 

Staging.  Our  development  in  this  paper  critically  depends  on  the 
letstatic  construct  we  presented  earlier.  It  can  be  seen  as  a  dual  of 
the  traditional  box  construct  of  Davies  and  Pfenning  [1996].  De¬ 
tails  of  its  typing  and  semantics  are  shown  in  Fig.  1  Id.  We  define  a 
notion  of  “static  evaluation  contexts”  S,  which  enclose  a  hole  of  the 
form  letstatic  X  =  •  in  e.  They  include  normal  evaluation  contexts, 
as  well  as  evaluation  contexts  under  binding  structures.  We  evaluate 
expressions  e  that  include  staging  constructs  using  the  — rela¬ 
tion;  internally,  this  uses  the  normal  evaluation  rules,  that  are  used 
in  the  second  stage  as  well,  for  evaluating  expressions  which  do 
not  include  other  staging  constructs.  If  stage-one  evaluation  is  suc¬ 
cessful,  we  are  left  with  a  residual  dynamic  configuration  (p',  ej) 
which  is  then  evaluated  normally.  We  prove  type-safety  for  stage- 
one  evaluation;  its  statement  follows. 

Theorem  6.2  (Stage-one  Type  Safety)  If  F  e  :  x  then:  ei¬ 

ther  e  is  a  dynamic  expression  e^;  or,  for  every  store  p  such  that 
h  /j  :  £,  we  have:  either  p,  e  — error,  or,  there  exists  an  e' ,  a  new 
store  typing  Z'  D  E  and  a  new  store  p'  such  that:  [p,  e)  — >■  (p' ,  e'); 
h  //  :  ll ;  and  •;  •he':!. 

Collapsing  extension  variables.  Last,  we  have  proved  the  fact 
that  under  the  conditions  described  in  Sec.  5,  it  is  possible  to  col¬ 
lapse  a  term  t  into  a  term  t'  which  is  typed  under  the  empty  exten¬ 
sion  variables  context;  a  substitution  a  with  which  we  can  regain 
the  original  term  t  exists.  This  suggests  that  whenever  a  proof  ob¬ 
ject  t  for  a  specific  proposition  is  required,  an  equivalent  proof  ob¬ 
ject  that  does  not  mention  uninstantiated  extension  variables  exists. 
Therefore,  we  can  write  an  equivalent  proof  script  producing  the 
collapsed  proof  object  instead,  and  evaluate  that  script  statically. 
The  statement  of  this  theorem  is  the  following: 

Theorem  6.3  If'V  h  [<1>]  t :  [<1>]  tx  and  collapsible  (*P  F  [<5]  f :  [<I>]  tj), 
then  there  exist  t’,  tj  and  a  such  that  •}-  rb'  wf  •  F  [<!>']  t’  : 
[<!>']  t'p.'i’-,  rb\-  a  t'  a  =  t  and  tj-a  =  tj. 
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The  main  idea  behind  the  proof  is  to  maintain  a  number  of  sub¬ 
stitutions  and  their  inverses:  one  to  go  from  a  general  'P  extension 
context  into  an  “equivalent”  context,  which  includes  only  defini¬ 
tions  of  the  form  [<h]  t,  for  a  constant  context  that  uses  no  exten¬ 
sion  variables.  Then,  another  substitution  and  its  inverse  are  main¬ 
tained  to  go  from  that  extension  variables  context  into  the  empty 
one;  this  is  simpler,  since  terms  typed  under  'P'  are  already  essen¬ 
tially  free  of  metavariables.  The  computational  content  within  the 
proof  amounts  to  a  procedure  for  transforming  proof  scripts  inside 
tactics  into  static  proof  scripts. 

7.  Implementation 

We  have  completed  a  prototype  implementation  of  the  VeriML 
language,  as  described  in  this  paper,  that  supports  all  of  our 
claims.  We  have  built  on  our  existing  prototype  [Stampoulis 
and  Shao  2010]  and  have  added  an  extensive  set  of  new  fea¬ 
tures  and  improvements.  The  prototype  is  written  in  OCaml  and 
is  about  6k  lines  of  code.  Using  the  prototype  we  have  imple¬ 
mented  a  number  of  examples,  that  are  about  1.5k  lines  of  code. 
Readers  are  encouraged  to  download  and  try  the  prototype  from 
http ://flint.cs.yale. edu/publications/supc . html. 

New  features.  We  have  implemented  the  new  features  we  have 
described  so  far:  context  matching,  non-linear  patterns,  proof- 
erasure  semantics,  staging,  and  inferencing  for  logical  and  com¬ 
putational  terms.  Proof-erasure  semantics  are  utilized  only  if  re¬ 
quested  by  a  per-function  flag,  enabling  us  to  selectively  “trust” 
tactics.  The  staging  construct  we  support  is  more  akin  to  the  (•)static 
form  described  as  syntactic  sugar  in  Sec.  5,  and  it  is  able  to  infer 
the  collapsing  substitutions  that  are  needed,  following  the  approach 
used  in  our  metatheory. 

Changes.  We  have  also  changed  quite  a  number  of  things  in  the 
prototype  and  improved  many  of  its  aspects.  A  central  change,  me¬ 
diated  by  our  new  treatment  of  the  conversion  rule,  was  to  modify 
the  used  logic  in  order  to  use  the  explicit  equality  approach;  the  ex¬ 
isting  prototype  used  the  XHOLc  logic.  We  also  switched  the  vari¬ 
able  representation  to  the  hybrid  deBruijn  levels-deBruijn  indices 
technique  we  described,  which  enabled  us  to  implement  subtyping 
based  on  context  subsumption.  Also,  we  have  adapted  the  typing 
rules  of  the  pattern  matching  construct  in  order  to  support  refining 
the  environment  based  on  the  current  branch. 

Examples  implemented.  We  have  implemented  a  number  of  ex¬ 
amples  to  support  our  claims.  First,  we  have  written  the  type-safe 
conversion  check  routine  for  pN,  and  extended  it  to  support  congru¬ 
ence  closure  based  on  equalities  in  the  context.  Proofs  of  this  lat¬ 
ter  tactic  are  constructed  automatically  through  static  proof  scripts, 
using  a  naive  rewriter  that  is  non-terminating  in  the  general  case. 
We  have  also  completed  proofs  for  theorems  of  arithmetic  for  the 
properties  of  addition  and  multiplication,  and  used  them  to  write  an 
arithmetic  simplification  tactic.  All  of  the  theorems  are  proved  by 
making  essential  use  of  existing  conversion  rules,  and  are  imme¬ 
diately  added  into  new  conversion  rules,  leading  to  a  compact  and 
clean  development  style.  The  resulting  code  does  not  need  to  make 
use  of  translation  validation  or  proof  by  reflection,  which  are  typi¬ 
cally  used  to  implement  similar  tactics  in  existing  proof  assistants. 

Towards  a  practical  proof  assistant.  In  order  to  facilitate  practi¬ 
cal  proof  and  program  construction  in  VeriML,  we  introduced  some 
features  to  support  surface  syntax,  enabling  users  to  omit  most  de¬ 
tails  about  the  environments  of  contextual  terms  and  the  substi¬ 
tutions  used  with  meta-variables.  This  syntax  follows  the  style  of 
Delphin,  assuming  an  ambient  logical  variable  environment  which 
is  extended  through  a  construct  denoted  as  vx  :  t.e.  Still,  the  full 
power  of  contextual  modal  type  theory  is  available,  which  is  cru¬ 
cial  in  order  to  change  what  the  current  ambient  environment  is. 


used,  as  we  saw  earlier,  for  static  calls  to  tactics.  In  general  the 
surface  syntax  leads  to  much  more  concise  and  readable  code. 

Last,  we  introduced  syntax  support  for  calls  to  tactics,  enabling 
users  to  write  proof  expressions  that  look  very  similar  to  proof 
scripts  in  current  proof  assistants.  We  developed  a  rudimentary 
ProofGeneral  mode  for  VeriML,  that  enables  us  to  call  the  VeriML 
type-checker  and  interpreter  for  parts  of  source  files.  By  adding 
holes  to  our  sources,  we  can  be  informed  by  the  type  inference 
mechanism  about  their  expected  types.  Those  types  correspond  to 
what  the  current  “proof  state”  is  at  that  point.  Therefore,  a  possi¬ 
ble  workflow  for  developing  tactics  or  proofs,  is  writing  the  known 
parts,  inserting  holes  in  missing  points  to  know  what  remains  to 
be  proved,  and  calling  the  typechecker  to  get  the  proof  state  infor¬ 
mation.  This  workflow  corresponds  closely  to  the  interactive  proof 
development  support  in  proof  assistants  like  Coq  and  Isabelle,  but 
generalizes  it  to  the  case  of  tactics  as  well. 

8.  Related  work 

There  is  a  large  body  of  work  that  is  related  to  the  ideas  we  have 
presented  here. 

Techniques  for  robust  proof  development.  There  have  been 
multiple  proposals  for  making  proof  development  inside  existing 
proof  assistants  more  robust.  A  well-known  technique  is  proof-by- 
reflection  [Boutin  1997]:  writing  total  and  certified  decision  proce¬ 
dures  within  the  functional  language  contained  in  a  logic  like  CIC. 
A  recently  introduced  technique  is  automation  through  canonical 
structures  [Gonthier  et  al.  2011]:  the  resolution  mechanism  for 
finding  instances  of  canonical  structures  (a  generalization  of  type 
classes)  is  cleverly  utilized  in  order  to  program  automation  proce¬ 
dures  for  specific  classes  of  propositions.  We  view  both  approaches 
as  somewhat  similar,  as  both  are  based  in  cleverly  exploiting  static 
“interpreters”  that  are  available  in  a  modern  proof  assistant:  the 
partial  evaluator  within  the  conversion  rule  in  the  former  case;  the 
unification  algorithm  within  instance  discovery  in  the  latter  case. 

Our  approach  can  thus  be  seen  as  similar,  but  also  as  a  gen¬ 
eralization  of  these  approaches,  since  a  general-purpose  program¬ 
ming  model  is  supported.  Therefore,  users  do  not  have  to  adapt  to 
a  specific  programming  style  for  writing  automation  code,  but  can 
rather  use  a  familiar  functional  language.  Proof-by-reflection  could 
perhaps  be  used  to  support  the  same  kind  of  extensions  to  the  con¬ 
version  rule;  still,  this  would  require  reflecting  a  large  part  of  the 
logic  in  itself,  through  a  prohibitively  complicated  encoding.  Both 
techniques  are  applicable  to  our  setting  as  well  and  could  be  used 
to  provide  benefits  to  large  developments  within  our  language. 

The  style  advocated  in  Chlipala  [201 1]  (and  elsewhere)  suggests 
that  proper  proof  engineering  entails  developing  sophisticated  au¬ 
tomation  tactics  in  a  modular  style,  and  extending  their  power  by 
adding  proved  lemmas  as  hints.  We  are  largely  inspired  by  this  ap¬ 
proach,  and  believe  that  our  introduction  of  the  extensible  conver¬ 
sion  rule  and  static  checking  of  tactics  can  significantly  benefit  it. 
We  demonstrate  similar  ideas  in  layering  conversion  tactics. 
Traditional  proof  assistants.  There  are  many  parallels  of  our 
work  with  the  LCF  family  of  proof  assistants,  like  HOL4  [Slind  and 
Norrish  2008]  and  HOL-Light  [Harrison  1996],  which  have  served 
as  inspiration.  First,  the  foundational  logic  that  we  use  is  similar. 
Also,  our  use  of  a  dedicated  ML-like  programming  language  to 
program  tactics  and  proof  scripts  is  similar  to  the  approach  taken 
by  HOL4  and  HOL-Light.  Last,  the  fact  that  no  proof  objects  need 
to  be  generated  is  shared.  Still,  checking  a  proof  script  in  HOL 
requires  evaluating  it  fully.  Using  our  approach,  we  can  selectively 
evaluate  parts  of  proof  scripts;  we  focus  on  conversion-like  tactics, 
but  we  are  not  limited  inherrently  to  those.  This  is  only  possible 
because  our  proof  scripts  carry  proof  state  information  within  their 
types.  Similarly,  proof  scripts  contained  within  LCF  tactics  cannot 
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be  evaluated  statically,  so  it  is  impossible  to  establish  their  validity 
upon  tactic  definition.  It  is  possible  to  do  a  transformation  similar  to 
ours  manually  (lifting  proof  scripts  into  auxiliary  lemmas  that  are 
proved  prior  to  the  tactic),  but  the  lack  of  type  information  means 
that  many  more  details  need  to  be  provided. 

The  Coq  proof  assistant  [Barras  et  al.  2010]  is  another  obvious 
point  of  reference  for  our  work.  We  will  focus  on  the  conversion 
rule  that  CIC,  its  accompanying  logic,  supports  -  the  same  prob¬ 
lems  with  respect  to  proof  scripts  and  tactics  that  we  described  in 
the  LCF  case  also  apply  for  Coq.  The  conversion  rule,  which  identi¬ 
fies  computationally  equivalent  propositions,  coupled  with  the  rich 
type  universe  available,  opens  up  many  possibilities  for  construct¬ 
ing  small  and  efficiently  checkable  proof  objects.  The  implementa¬ 
tion  of  the  conversion  rule  needs  to  be  part  of  the  trusted  base  of 
the  proof  assistant.  Also,  the  fact  that  the  conversion  check  is  built- 
in  to  the  proof  assistant  makes  the  supported  equivalence  rigid  and 
non-extensible  by  frequently  used  decision  procedures. 

There  is  a  large  body  of  work  that  aims  to  extend  the  conver¬ 
sion  rule  to  arbitrary  confluent  rewrite  systems  (e.g.  Blanqui  et  al. 
[1999])  and  to  include  decision  procedures  [Strub  2010].  These 
approaches  assume  some  small  or  larger  addition  to  the  trusted 
base,  and  extend  the  already  complex  metatheory  of  Coq.  Further¬ 
more,  the  NuPRL  proof  assistant  [Constable  et  al.  1986]  is  based 
on  extensional  type  theory  which  includes  an  extensional  conver¬ 
sion  rule.  This  enables  complex  decision  procedures  to  be  part  of 
conversion;  but  it  results  in  a  very  large  trusted  base.  We  show  how, 
for  a  subset  of  these  type  theories,  the  conversion  check  can  be  re¬ 
covered  outside  the  trusted  base.  It  can  be  extended  with  arbitrarily 
complex  new  tactics,  written  in  a  familiar  programming  style,  with¬ 
out  any  metatheoretic  additions  and  without  hurting  the  soundness 
of  the  logic.  The  question  of  whether  these  type  theories  can  be 
supported  in  full  remains  as  future  work,  but  as  far  as  we  know, 
there  is  no  inherrent  limitation  to  our  approach. 

Dependently-typed  programming.  The  large  body  of  work  on 
dependently-typed  languages  has  close  parallels  to  our  work.  Out 
of  the  multitude  of  proposals,  we  consider  the  Russell  framework 
[Sozeau  2006]  as  the  current  state-of-the-art,  because  of  its  high 
expressivity  and  automation  in  discharging  proof  obligations.  In 
our  setting,  we  can  view  dependently-typed  programming  as  a  spe¬ 
cific  case  of  tactics  producing  complex  data  types  that  include 
proof  objects.  Static  proof  scripts  can  be  leveraged  to  support  ex¬ 
pressivity  similar  to  the  Russell  framework.  Furthermore,  our  ap¬ 
proach  opens  up  a  new  intriguing  possibility:  dependently-typed 
programs  whose  obligations  are  discharged  statically  and  automat¬ 
ically,  through  code  written  within  the  same  language. 

Last,  we  have  been  largely  inspired  by  the  work  on  languages 
like  Beluga  [Pientka  and  Dunfield  2008]  and  Delphin  [Poswolsky 
and  Schiirmann  2008],  and  build  on  our  previous  work  on  VeriML 
[Stampoulis  and  Shao  2010].  We  investigate  how  to  leverage  type- 
safe  tactics,  as  well  as  a  number  of  new  constructs  we  introduce,  so 
as  to  offer  an  extensible  notion  of  proof  checking.  Also,  we  address 
the  issue  of  statically  checking  the  proof  scripts  contained  within 
tactics  written  in  VeriML.  As  far  as  we  know,  our  development  is 
the  first  time  languages  such  as  these  have  been  demonstrated  to 
provide  a  workflow  similar  to  interactive  proof  assistants. 
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Abstract 

Despite  reeent  sueeesses,  large-seale  proof  development  within  proof  assistants  remains  an  areane  art  that  is 
extremely  time-eonsuming.  We  argue  that  this  ean  be  attributed  to  two  profound  shorteomings  in  the  arehiteeture 
of  modern  proof  assistants.  The  first  is  that  proofs  need  to  inelude  a  large  amount  of  minute  detail;  this  is  due  to 
the  rigidity  of  the  proof  eheeking  proeess,  whieh  eannot  be  extended  with  domain-speeitie  knowledge.  In  order 
to  avoid  these  details,  we  rely  on  developing  and  using  taeties,  speeialized  proeedures  that  produee  proofs. 
Unfortunately,  taeties  are  both  hard  to  write  and  hard  to  use,  revealing  the  seeond  shorteoming  of  modern  proof 
assistants.  This  is  beeause  there  is  no  statie  knowledge  about  their  expeeted  use  and  behavior. 

As  has  reeently  been  demonstrated,  languages  that  allow  type-safe  manipulation  of  proofs,  like  Beluga, 
Delphin  and  VeriML,  ean  be  used  to  partly  mitigate  this  seeond  issue,  by  assigning  rieh  types  to  taeties.  Still, 
the  arehiteetural  issues  remain.  In  this  paper,  we  build  on  this  existing  work,  and  demonstrate  two  novel  ideas: 
an  extensible  conversion  rule  and  support  for  static  proof  scripts .  Together,  these  ideas  enable  us  to  support  both 
user-extensible  proof  eheeking,  and  sophistieated  statie  eheeking  of  taeties,  leading  to  a  new  point  in  the  design 
spaee  of  future  proof  assistants.  Both  ideas  are  based  on  the  interplay  between  a  light-weight  staging  eonstruet 
and  the  rieh  type  information  available. 

Categories  and  Subject  Descriptors  D.3.1  [Programming  Languages]:  Formal  Definitions  and  Theory 
General  Terms  Languages,  Veriheation 

1.  Introduction 

There  have  been  various  reeent  sueeesses  in  using  proof  assistants  to  eonstruet  foundational  proofs  of  large 
software,  like  a  C  eompiler  [Leroy  2009]  and  an  OS  mierokernel  [Klein  et  al.  2009],  as  well  as  eomplieated 
mathematieal  proofs  [Gonthier  2008].  Despite  this  sueeess,  the  proeess  of  large-seale  proof  development  using 
the  foundational  approaeh  remains  a  eomplieated  endeavor  that  requires  signifieant  manual  effort  and  is  plagued 
by  various  arehiteetural  issues. 

The  big  benefit  of  using  a  foundational  proof  assistant  is  that  the  proofs  involved  ean  be  eheeked  for  validity 
using  a  very  small  proof  eheeking  proeedure.  The  downside  is  that  these  proofs  are  very  large,  sinee  proof 
eheeking  is  fixed.  There  is  no  way  to  add  domain-speeitie  knowledge  to  the  proof  eheeker,  whieh  would  enable 
proofs  that  spell  out  less  details.  There  is  good  reason  for  this,  too:  if  we  allowed  arbitrary  extensions  of  the 
proof  eheeker,  we  eould  very  easily  permit  it  to  aeeept  invalid  proofs. 

Beeause  of  this  laek  of  extensibility  in  the  proof  eheeker,  users  rely  on  taeties:  proeedures  that  produee  proofs. 
Users  are  free  to  write  their  own  taeties,  that  ean  ereate  domain-speeitie  proofs.  In  faet,  developing  domain- 
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specific  tactics  is  considered  to  be  good  engineering  when  doing  large  developments,  leading  to  significantly 
decreased  overall  effort  -  as  shown,  e.g.  in  Chlipala  [2011].  Still,  using  and  developing  tactics  is  error-prone. 
Tactics  are  essentially  untyped  functions  that  manipulate  logical  terms,  and  thus  tactic  programming  is  untyped. 
This  means  that  common  errors,  like  passing  the  wrong  argument,  or  expecting  the  wrong  result,  are  not  caught 
statically.  Exacerbating  this,  proofs  contained  within  tactics  are  not  checked  statically,  when  the  tactic  is  defined. 
Therefore,  even  if  the  tactic  is  used  correctly,  it  could  contain  serious  bugs  that  manifest  only  under  some 
conditions. 

With  the  recent  advent  of  programming  languages  that  support  strongly  typed  manipulation  of  logical 
terms,  such  as  Beluga  [Pientka  and  Dunfield  2008],  Delphin  [Poswolsky  and  Schiirmann  2008]  and  VeriML 
[Stampoulis  and  Shao  2010],  this  situation  can  be  somewhat  mitigated.  It  has  been  shown  in  Stampoulis  and 
Shao  [2010]  that  we  can  specify  what  kinds  of  arguments  a  tactic  expects  and  what  kind  of  proof  it  produces, 
leading  to  a  type-safe  programming  style.  Still,  this  does  not  address  the  fundamental  problem  of  proof  checking 
being  fixed  -  users  still  have  to  rely  on  using  tactics.  Furthermore,  the  proofs  contained  within  the  type-safe 
tactics  are  in  fact  proof-producing  programs,  which  need  to  be  evaluated  upon  invocation  of  the  tactic.  Therefore 
proofs  within  tactics  are  not  checked  statically,  and  they  can  still  cause  the  tactics  to  fail  upon  invocation. 

In  this  paper,  we  build  on  the  past  work  on  these  languages,  aiming  to  solve  both  of  these  issues  regarding 
the  architecture  of  modern  proof  assistants.  We  introduce  two  novel  ideas:  support  for  an  extensible  conversion 
rule  and  static  proof  scripts  inside  tactics.  The  former  technique  enables  proof  checking  to  become  user- 
extensible,  while  maintaining  the  guarantee  that  only  logically  sound  proofs  are  admitted.  The  latter  technique 
allows  for  statically  checking  the  proofs  contained  within  tactics,  leading  to  increased  guarantees  about  their 
runtime  behavior.  Both  techniques  are  based  on  the  same  mechanism,  which  consists  of  a  light-weight  staging 
construct.  There  is  also  a  deep  synergy  between  them,  allowing  us  to  use  the  one  to  the  benefit  of  the  other. 

Our  main  contributions  are  the  following: 

•  First,  we  present  what  we  believe  is  the  first  technique  for  having  an  extensible  conversion  rule,  which 
combines  the  following  characteristics:  it  is  safe,  meaning  that  it  preserves  logical  soundness;  it  is  user- 
extensible,  using  a  familiar,  generic  programming  model;  and,  it  does  not  require  metatheoretic  additions  to 
the  logic,  but  can  be  used  to  simplify  the  logic  instead. 

•  Second,  building  on  existing  work  for  typed  tactic  development,  we  introduce  static  checking  of  the  proof 
scripts  contained  within  tactics.  This  significantly  reduces  the  development  effort  required,  allowing  us  to 
write  tactics  that  benefit  from  existing  tactics  and  from  the  rich  type  information  available. 

•  Third,  we  show  how  typed  proof  scripts  can  be  seen  as  an  alternative  form  of  proof  witness,  which  falls 
between  a  proof  object  and  a  proof  script.  Receivers  of  the  certificate  are  able  to  decide  on  the  tradeoff 
between  the  level  of  trust  they  show  and  the  amount  of  resources  needed  to  check  its  validity. 

In  terms  of  technical  contributions,  we  present  a  number  of  technical  advances  in  the  metatheory  of 
the  aforementioned  programming  languages.  These  include  a  simple  staging  construct  that  is  crucial  to  our 
development  and  a  new  technique  for  variable  representation.  We  also  show  a  condition  under  which  static 
checking  of  proof  scripts  inside  tactics  is  possible.  Fast,  we  have  extended  an  existing  prototype  implementation 
with  a  significant  number  of  features,  enabling  it  to  support  our  claims,  while  also  rendering  its  use  as  a  proof 
assistant  more  practical. 

2.  Informal  presentation 

Glossary  of  terms.  We  will  start  off  by  introducing  some  concepts  that  will  be  used  throughout  the  paper.  The 
first  fundamental  concept  we  will  consider  is  the  notion  of  a  proof  object,  given  a  derivation  of  a  proposition 
inside  a  formal  logic,  a  proof  object  is  a  term  representation  of  this  derivation.  A  proof  checker  is  a  program 
that  can  decide  whether  a  given  proof  object  is  a  valid  derivation  of  a  specific  proposition  or  not.  Proof  objects 
are  extremely  verbose  and  are  thus  hard  to  write  by  hand.  For  this  reason,  we  use  tactics:  functions  that  produce 
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(a)  HOL  approach 


static 


dynamic 


proof  objects.  By  combining  tactics  together,  we  create  proof-producing  programs,  which  we  call  proof  scripts. 
If  a  proof  script  is  evaluated,  and  the  evaluation  completes  successfully,  the  resulting  proof  object  can  be  checked 
using  the  original  proof  checker.  In  this  way,  the  trusted  base  of  the  system  is  kept  at  the  absolute  minimum. 
The  language  environment  where  proof  scripts  and  tactics  are  written  and  evaluated  is  called  a  proof  assistant, 
evidently,  it  needs  to  include  a  proof  checker. 

Checking  proof  objects.  In  order  to  keep  the  size  of  proof  objects  manageable,  many  of  the  logics  used  for 
mechanized  proof  checking  include  a  conversion  rule.  This  rule  is  used  implicitly  by  the  proof  checker  to 
decide  whether  any  two  propositions  are  equivalent;  if  it  determines  that  they  are  indeed  so,  the  proof  of  their 
equivalence  can  be  omitted.  We  can  thus  think  of  it  as  a  special  tactic  that  is  embedded  within  the  proof  checker, 
and  used  implicitly. 
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The  more  sophisticated  the  relation  supported  by  the  conversion  rule  is,  the  simpler  are  proof  objects  to  write, 
since  more  details  can  be  omitted.  On  the  other  hand,  the  proof  checker  becomes  more  complicated,  as  does 
the  metatheory  proof  showing  the  soundness  of  the  associated  logic.  The  choice  in  Coq  [B  arras  et  al.  2010], 
one  of  the  most  widely  used  proof  assistants,  with  respect  to  this  trade-off,  is  to  have  a  conversion  rule  that 
identifies  propositions  up  to  evaluation.  Nevertheless,  extended  notions  of  conversion  are  desirable,  leading  to 
proposals  like  CoqMT  [Strub  2010],  where  equivalence  up  to  first-order  theories  is  supported.  In  both  cases,  the 
conversion  rule  is  fixed,  and  exfending  if  requires  significanf  amounfs  of  work.  If  is  fhus  nof  possible  for  users 
fo  exfend  if  using  fheir  own,  domain-specific  facfics,  and  proof  objecfs  are  fhus  bound  fo  gef  large.  This  is  why 
we  have  fo  resorf  fo  wrifing  proof  scripfs. 

Checking  proof  scripts.  As  menfioned  earlier,  in  order  fo  validafe  a  proof  scripl  we  need  fo  evaluafe  if  (see  Fig. 
la);  fhis  is  fhe  modus  operandi  in  proof  assisfanfs  of  fhe  HOL  family  [Harrison  1996;  Slind  and  Norrish  2008]. 
Therefore,  if  is  easy  fo  exfend  fhe  checking  procedure  for  proof  scripfs  by  wrifing  a  new  facfic,  and  calling 
if  as  parf  of  a  scripf.  The  price  fhaf  fhis  comes  fo  is  fhaf  fhere  is  no  way  fo  have  any  sorf  of  sfafic  guaranfee 
abouf  fhe  validify  of  fhe  scripf,  as  proof  scripfs  are  complefely  unfyped.  This  can  be  somewhaf  mifigafed  in  Coq 
by  ufilizing  fhe  sfafic  checking  fhaf  if  already  supporfs:  fhe  proof  checker,  and  especially,  fhe  conversion  rule 
if  confains  (see  Fig.  lb).  We  can  employ  proof  objecfs  in  our  scripfs;  fhis  is  especially  useful  when  fhe  proof 
objecfs  are  frivial  fo  wrife  buf  frigger  complex  conversion  checks.  This  is  fhe  essenfial  idea  behind  techniques 
like  proof-by-relleclion  [Boufin  1997],  which  lead  fo  more  robusf  proof  scripfs. 

In  previous  work  [Sfampoulis  and  Shao  2010]  we  infroduced  VeriML,  a  language  fhaf  enables  programming 
facfics  and  proof  scripfs  in  a  fypeful  manner  using  a  general-purpose,  side-effeclful  programming  model. 
Combining  fyped  facfics  leads  fo  typed  proof  scripts.  These  are  sfill  programs  producing  proof  objecfs,  buf 
fhe  proposifion  fhey  prove  is  carried  wifhin  fheir  fype.  Informafion  abouf  fhe  currenf  proof  sfafe  (fhe  sef  of 
hypofheses  and  goals)  is  also  available  sfafically  al  every  intermediate  poinl  of  fhe  proof  scripl.  In  fhis  way,  fhe 
sialic  assurances  abouf  proof  scripfs  are  significanlly  increased  and  many  polenlial  sources  of  fype  errors  are 
removed.  On  fhe  olher  hand,  fhe  proof  objecfs  conlained  wifhin  fhe  scripfs  are  sfill  checked  using  a  fixed  proof 
checker;  fhis  ultimately  means  fhaf  fhe  sef  of  possible  sfafic  guarantees  is  sfill  fixed. 

Extensible  conversion  rule.  In  fhis  paper,  we  build  on  our  earlier  work  on  VeriML.  In  order  fo  furlher  increase 
fhe  amounf  of  sialic  checking  of  proof  scripfs  fhaf  is  possible  wifhin  fhis  language,  we  propose  fhe  nofion  of  an 
extensible  conversion  rule  (see  Fig.  Ic).  If  enables  users  fo  wrife  fheir  own  domain- specific  conversion  checks 
fhaf  gel  included  in  fhe  conversion  rule.  This  leads  fo  simpler  proof  scripfs,  as  more  parls  of  fhe  proof  can  be 
inferred  by  fhe  conversion  rule  and  can  Iherefore  be  omifled.  Also,  if  leads  fo  increased  sfafic  guaranlees  for 
proof  scripfs,  since  fhe  conversion  checks  happen  before  fhe  resl  of  fhe  proof  scripl  is  evalualed. 

The  way  we  achieve  fhis  is  by  programming  fhe  conversion  checks  as  fype-safe  facfics  wifhin  VeriML,  and 
Ihen  evalualing  Ihem  sfafically  using  a  simple  slaging  mechanism  (see  Fig.  2).  The  fype  of  fhe  conversion  facfics 
requires  fhaf  fhey  produce  a  proof  objecl  which  proves  fhe  claimed  equivalence  of  fhe  proposilions.  In  fhis  way, 
fype  safely  of  VeriML  guarantees  fhaf  soundness  is  mainlained.  Al  fhe  same  lime,  users  are  free  fo  extend  fhe 
conversion  rule  wilh  fheir  own  conversion  facfics  wrillen  in  a  familiar  programming  model,  wilhouf  requiring 
any  mefalheorelic  addilions  or  termination  proofs.  Such  proofs  are  only  necessary  if  decidability  of  the  extra 
conversion  checks  is  desired.  Furthermore,  this  approach  allows  for  metatheoretic  reductions  as  the  original 
conversion  rule  can  be  programmed  within  the  language.  Thus  it  can  be  removed  from  the  logic,  and  replaced 
by  the  simpler  notion  of  explicit  equalities,  leading  to  both  simpler  metatheory  and  a  smaller  trusted  base. 

Checking  tactics.  The  above  approach  addresses  the  issue  of  being  able  to  extend  the  amount  of  static 
checking  possible  for  proof  scripts.  But  what  about  tactics?  Our  existing  work  on  VeriML  shows  how  the 
increased  type  information  addresses  some  of  the  issues  of  tactic  development  using  current  proof  assistants, 
where  tactics  are  programmed  in  a  completely  untyped  manner. 
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Figure  2.  Staging  in  VeriML 

Still,  if  we  consider  the  case  of  tactics  more  closely,  we  will  see  that  there  is  a  limitation  to  the  amount  of 
checking  that  is  done  statically,  even  using  this  language.  When  programming  a  new  tactic,  we  would  like  to 
reuse  existing  tactics  to  produce  the  required  proofs.  Therefore,  rather  than  writing  proof  objects  by  hand  inside 
the  code  of  a  tactic,  we  would  rather  use  proof  scripts.  The  issue  is  that  in  order  to  check  whether  the  contained 
proof  scripts  are  valid,  they  need  to  be  evaluated  -  but  this  only  happens  when  an  invocation  of  the  tactic  reaches 
the  point  where  the  proof  script  is  used.  Therefore,  the  static  guarantees  that  this  approach  provides  are  severely 
limited  by  the  fact  that  the  proof  scripts  inside  the  tactics  cannot  be  checked  statically,  when  the  tactic  is  defined. 

Static  proof  scripts.  This  is  the  second  fundamental  issue  we  address  in  this  paper.  We  show  that  the  same 
staging  construct  utilized  for  introducing  the  extensible  conversion  rule,  can  be  leveraged  to  perform  static  proof 
checking  for  tactics.  The  crucial  point  of  our  approach  is  the  proof  of  existence  of  a  transformation  between 
proof  objects,  which  suggests  that  under  reasonable  conditions,  a  proof  script  contained  within  a  tactic  can  be 
transformed  into  a  static  proof  script.  This  static  script  can  then  be  evaluated  at  tactic  definition  time,  to  be 
checked  for  validity. 

Last,  we  will  show  that  this  approach  lends  itself  well  to  writing  extensions  of  the  conversion  rule.  We  show 
that  we  can  create  a  layering  of  conversion  rules:  using  a  basic  conversion  rule  as  a  starting  point,  we  can  utilize 
it  inside  static  proof  scripts  to  implicitly  prove  the  required  obligations  of  a  more  advanced  version,  and  so 
on.  This  minimizes  the  required  user  effort  for  writing  new  conversion  rules,  and  enables  truly  modular  proof 
checking. 

3.  Our  toolbox 

In  this  section,  we  will  present  the  essential  ingredients  that  are  needed  for  the  rest  of  our  development.  The 
main  requirement  is  a  language  that  supports  type-safe  manipulation  of  terms  of  a  particular  logic,  as  well 
as  a  general-purpose  programming  model  that  includes  general  recursion  and  other  side-effectful  operations. 
Two  recently  proposed  languages  for  manipulating  LF  terms.  Beluga  [Pientka  and  Dunfield  2008]  and  Delphin 
[Poswolsky  and  Schurmann  2008],  fit  this  requirement,  as  does  VeriML  [Stampoulis  and  Shao  2010],  which  is  a 
language  used  to  write  type-safe  tactics.  Our  discussion  is  focused  on  the  latter,  as  it  supports  a  richer  ML-style 
calculus  compared  to  the  others,  something  useful  for  our  purposes.  Still,  our  results  apply  to  all  three. 

We  will  now  briefly  describe  fhe  consfrucfs  fhaf  fhese  languages  supporf,  as  well  as  some  new  exfensions  fhaf 
we  propose.  The  inferesfed  reader  can  read  more  abouf  fhese  consfrucfs  in  Sec.  6  and  in  fhe  appendix. 

A  formal  logic.  The  compufafional  language  we  are  presenfing  is  centered  around  manipulafion  of  terms  of  a 
specific  formal  logic.  We  will  see  more  defails  abouf  fhis  logic  in  Sec.  4.  For  fhe  lime  being,  if  will  suffice 
fo  presenl  a  sel  of  assumplions  abouf  fhe  synlacfic  classes  and  fyping  judgemenfs  of  fhis  logic,  shown  in 
Fig.  3.  Logical  terms  are  represented  by  the  syntactic  class  t,  and  include  proof  objects,  propositions,  terms 
corresponding  to  the  domain  of  discourse  (e.g.  natural  numbers),  and  the  needed  sorts  and  type  constructors  to 
classify  such  terms.  Their  variables  are  assigned  types  through  an  ordered  context  <I>.  A  package  of  a  logical 
term  t  together  with  the  variables  context  it  inhabits  <I>  is  called  a  contextual  term  and  denoted  as  T  =  [<I>]  t.  Our 
computational  language  works  over  contextual  terms  for  reasons  that  will  be  evident  later.  The  logic  incorporates 
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t  ::=  proof  object  constructors  \  propositions  \  natural  numbers,  lists,  etc.  \  sorts  and  types  \  X/o 
<I>  ::=  •  I  <I),  X  :  f  T  ::=  [O]? 

::=  •  I  X  :  r  a  ::=  •  |  a,  f 

main  judgement:  <I>  h  t  :  t'  (type  of  a  logical  term) 


Figure  3.  Assumptions  about  the  logie  language 


k  ::=  *  \  ki  ^  k2 

X  ::=  unit  |  int  |  bool  |  Xi  — ^  X2  |  Xi  +X2  |  Xi  x  X2  |  A'tt  :  k.x  |  Va  :  fc.x  |  a  |  array  x|Xa:^.x|xiX2|--- 
e  ::=  {)  \  n  \  e\  +  e2  \  ei  <  e2  \  true  |  false  |  if  e  then  ei  elsee2  |  :  x.e  |  ^2  |  (^i,  ^2)  |  ptoj,-  e  \  inj,-  e 

I  case(e,  x\.ei,  X2.e2)  |  fold  e  |  unfold  e  |  Aa  ■.k.e\ex\\\xx\  x.e  \  mkarray(e,e')  |  e[e']  \  e[e']  :=  e” 
I  I  I  error 

r  ::=  •  I  r,  X  :  X  I  r,  a  :  £  ::=  •  |  Z,  Z :  array  x 


Figure  4.  Syntax  for  the  eomputational  language  (ML  fragment) 


X  ::=•••  I  (A  :  r)  ^  X  I  (A  :  r)  X  X  I  ((|) :  ctx)  ^  x 
e  ::=•••  |  XA  :  T.e  |  e  T  |  X(|) :  ctx.e  \e^\  {T,  e)  \  let  (A,  x)  =  e  in  e' 

I  holcase  T  return  x  of  (Fi  ^  e\)---  (Tn  1-^  e„)  |  ctxcase  <I>  return  x  of  (<I>i  ^  ei)---  (<4>„  1-^  e„) 


Figure  5.  Syntax  for  the  eomputational  language  (logieal  term  eonstruets) 


sueh  terms  by  allowing  them  to  get  substituted  for  meta-variables  X,  using  the  eonstruetor  A/a.  When  a  term 
T  =  [<!>']  t  gets  substituted  for  A,  we  go  from  the  <4>'  eontext  to  the  eurrent  eontext  <I>  using  the  substitution  a. 

Logieal  terms  are  elassified  using  other  logieal  terms,  based  on  the  normal  variables  environment  <I>,  and  also 
an  environment  'F  that  types  meta- variables,  thus  leading  to  the  'F;  <I>  h  t  :  t'  judgement.  For  example,  a  term  t 
representing  a  elosed  proposition  will  be  typed  as  •  h  t :  Prop,  while  a  proof  objeet  tpf  proving  that  proposition 
will  satisfy  the  judgement  •;  •  h  tpf :  t. 

ML-style  functional  programming.  We  move  on  to  the  eomputational  language.  As  its  main  eore,  we  assume 
an  ML-style  funetional  language,  supporting  general  reeursion,  algebraic  data  types  and  mutable  references 
(see  Fig.  4).  Terms  of  this  fragment  are  typed  under  a  computational  variables  environment  F  and  a  store  typing 
environment  £,  mapping  mutable  locations  to  types.  Typing  judgements  are  entirely  standard,  leading  to  a 
£;  F  h  e  :  x  judgement  for  typing  expressions. 

Dependently-typed  programming  over  logical  terms.  As  shown  in  Fig.  5,  the  first  important  additions  to 
the  ML  computational  core  are  constructs  for  dependent  functions  and  products  over  contextual  terms  T . 
Abstraction  over  contextual  terms  is  denoted  as  XA  :  T.e.  It  has  the  dependent  function  type  (A  :  T)  — >  x.  The  type 
is  dependent  since  the  introduced  logical  term  might  be  used  as  the  type  of  another  term.  An  example  would  be  a 
function  that  receives  a  proposition  plus  a  proof  object  for  that  proposition,  with  type:  (P  :  Prop)  — )■  (A  :  F)  — X. 
Dependent  products  that  package  a  contextual  logical  term  with  an  expression  are  introduced  through  the  {T,  e) 
construct  and  eliminated  using  let  (A,  x)  =  e  In  e';  their  type  is  denoted  as  (A  :  T)  x  x.  Especially  for  packages 
of  proof  objects  with  the  unit  type,  we  introduce  the  syntax  LT(r). 
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Last,  in  order  to  be  able  to  support  funetions  that  work  over  terms  in  any  eontext,  we  introduee  eontext 
polymorphism,  through  a  similarly  dependent  funetion  type  over  eontexts.  With  these  in  mind,  we  ean  define 
a  simple  taetie  that  gets  a  paekaged  proof  of  a  universally  quantified  formula,  and  an  insfanfiafion  ferm,  and 
refurns  a  proof  of  fhe  insfanfiafed  formula  as  follows: 

instantiate  : 

instantiate  (|) 

From  here  on,  we  will  omit  details  about  eontexts  and  substitutions  in  the  interest  of  presentation. 

Pattern  matching  over  terms.  The  most  important  new  eonstruet  that  VeriML  supports  is  a  pattern  matehing 
eonstruet  over  logieal  terms  denoted  as  holcase.  This  eonstruet  is  used  for  dependent  matehing  of  a  logieal  term 
against  a  set  of  patterns.  The  return  elause  speeifies  its  return  type;  we  omit  it  when  it  is  easy  to  infer.  Patterns 
are  normal  terms  that  inelude  unifieation  variables,  whieh  ean  be  present  under  binders.  This  is  the  essential 
reason  why  eontextual  terms  are  needed. 

Pattern  matching  over  environments.  For  the  purposes  of  our  development,  it  is  very  useful  to  support  one 
more  pattern  matehing  eonstruet:  matehing  over  logieal  variable  eontexts.  When  trying  to  eonstruet  a  eertain 
proof,  the  logieal  environment  represents  what  the  eurrent  proof  eontext  is:  what  the  eurrent  logieal  hypotheses 
at  hand  are,  what  types  of  terms  have  been  quantified  over,  ete.  By  being  able  to  pattern  mateh  over  the 
environment,  we  ean  “look  up”  things  in  our  eurrent  set  of  hypotheses,  in  order  to  prove  further  propositions. 
We  ean  thus  view  the  eurrent  environment  as  representing  a  simple  form  of  the  eurrent  proof  state',  the  pattern 
matehing  eonstruet  enables  us  to  manipulate  it  in  a  type-safe  manner. 

One  example  is  an  “assumption”  taetie,  that  tries  to  prove  a  proposition  by  searehing  for  a  matehing 
hypotheses  in  the  eontext: 


((|) :  ctx,  T  :  [(|)]Type,  F  :  [(|),  x  :  T]  Prop,  a  :  [(])]  T) 
LT([^]Vx:r,F)^LT([^]F/[id^,  a]) 

TFapf  =  \e\{H)  =pfin  (Ha) 


assumption  :  ((|) :  ctx,F  :  Prop)  — ^  option  LT(F) 
assumption  ^P  = 
ctxcase  (|)  of 

H  :  Pi-^  return  {H) 

I  (|)',  _  assumption  (|)'  P 

Proof  object  erasure  semantics  (new  feature).  The  only  eonstruet  that  ean  intluenee  the  evaluation  of  a 
program  based  on  the  strueture  of  a  logieal  term  is  the  pattern  matehing  eonstruet.  For  our  purposes,  pattern 
matehing  on  proof  objeets  is  not  neeessary  -  we  never  look  into  the  strueture  of  a  eompleted  proof.  Thus  we  ean 
have  the  typing  rules  of  the  pattern  matehing  eonstruet  speeitieally  disallow  matehing  on  proof  objeets. 

In  that  ease,  we  ean  define  an  alternate  operational  semanties  for  our  language  where  all  proof  objeets  are 
eraser/ before  using  the  original  small-step  reduetion  rules.  Beeause  of  type  safety,  these  proof-erasure  semanties 
are  guaranteed  to  yield  equivalent  results:  even  if  no  proof  objeets  are  generated,  they  are  still  bound  to  exist. 

Implicit  arguments.  Let  us  eonsider  again  the  instantiate  funetion  defined  earlier.  This  funetion  expeets  five 
arguments.  From  its  type  alone,  it  is  evident  that  only  the  last  two  arguments  are  strietly  neeessary.  The  last 
argument,  eorresponding  to  a  proof  expression  for  the  proposition  Vx  :  T,P,  ean  be  used  to  reeonstruet  exaetly 
the  arguments  (|),  T  and  P.  Furthermore,  if  we  know  what  the  resulting  type  of  a  eall  to  the  funetion  needs  to  be, 
we  ean  ehoose  even  the  instantiation  argument  a  appropriately.  We  employ  a  simple  inferrenee  meehanism  so 
that  sueh  arguments  are  omitted  from  our  programs.  This  feature  is  also  erueial  in  our  development  in  order  to 
implieitly  maintain  and  utilize  the  eurrent  proof  state  within  our  proof  seripts. 
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(sorts) 
(kinds) 
(props.) 
(dom.obj.) 
(proof  objects) 
(HOL  terms) 


5  ::=  Type  I  Type' 

%  ::=  Prop  |  Nat  \%i  ^%2 
P  ::=  Pi  — ^  P2  I  Vx  :  %.P  \  x  \  True  |  False  |  Pi  AP2 

d  ::=  Zero  |  Succ  d  |  P  |  •  •  • 

71  ::=  X  I  Xx  :  P.7t  I  7ti  712  I  Xx  :  3<C.7t  |  7t  d  |  •  •  • 

1  ::=  5  I  X  I  P  I  d  I  71 


Selected  rules: 


Intro 

<I>,x  :  P  h  71 :  P' 

'P;  <I>  h  Xx  :  P.Tt :  P  ^  P' 


Elim 

<I>  h  71 :  P  P' 
'P;  <I>  h  7t' :  P 
'E;  <I>  h  71 7t' :  P' 


Figure  6.  Syntax  and  selected  rules  of  the  logic  language  XHOL 


Conversion 


O  h,  71 :  P  P  =pN  P' 
'F;  O  he  71 :  P' 


d  — rf 


(Xx  :  %.d)  d'  — )'pN  d[d' /x] 
nalElimjc  d^  ds  zero  — ^ppi  d^ 

nalElimjc  d^  ds  (succ  d)  — ^pN  ds  d  (nalElimjc  d^  ds  d) 


is  the  compatible,  reflexive,  symmetric  and  transitive 
closure  of  d  — ^ppi  d' 


Figure  7.  Extending  XHOL  with  the  conversion  rule  (XHOLc) 


Minimal  staging  support  (new  feature).  Using  the  language  we  have  seen  so  far  we  are  able  to  write  powerful 
tactics  using  a  general-purpose  programming  model.  But  what  if,  inside  our  programs,  we  have  calls  to  tactics 
where  all  of  their  arguments  are  constant?  Presumably,  those  tactic  calls  could  be  evaluated  to  proof  objects  prior 
to  tactic  invocation.  We  could  think  of  this  as  a  form  of  generalized  constant  folding,  which  has  one  intriguing 
benefit:  we  can  tell  statically  whether  the  tactic  calls  succeed  or  not. 

This  paper  is  exactly  about  exploring  this  possibility.  Towards  this  effect,  we  introduce  a  rudimentary  staging 
construct  in  our  computational  language.  This  takes  the  form  of  a  letstatic  construct,  which  binds  a  static 
expression  to  a  variable.  The  static  expression  is  evaluated  during  stage  one  (see  Fig.  2),  and  can  only  depend  on 
other  static  expressions.  Details  of  this  construct  are  presented  in  Fig.  1  Id  and  also  in  Sec.  6.  After  this  addition, 
expressions  in  our  language  have  a  three-phase  lifetime,  that  are  also  shown  in  Fig.  2. 

—  type-checking,  where  the  well-formedness  of  expressions  according  to  the  rules  of  the  language  is  checked, 
and  inference  of  implicit  arguments  is  performed 

—  static  evaluation,  where  expressions  inside  letstatic  are  reduced  to  values,  yielding  a  residual  expression 

—  run-time,  where  the  residual  expression  is  evaluated 
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4.  Extensible  conversion  rule 


With  these  tools  at  hand,  let  us  now  return  to  the  first  issue  that  motivates  us:  the  faet  that  proof  eheeking  is  rigid 
and  eannot  be  extended  with  user-defined  proeedures.  As  we  have  said  in  our  infroduefion,  many  modern  proof 
assisfanfs  are  based  on  logies  fhaf  inelude  a  conversion  rule.  This  rule  essenfially  idenfifies  proposifions  up  fo 
some  equivalenee  relation:  usually  fhis  is  equivalenee  up  fo  partial  evaluation  of  fhe  funelions  eonfained  wifhin 
propositions. 

The  supporfed  relafion  is  deeided  when  fhe  logie  is  designed.  Any  exfension  fo  fhis  relafion  requires  a 
signifieanf  amounf  of  work,  bofh  in  ferms  of  implemenfafion,  and  in  ferms  of  mefafheorefie  proof  required. 
This  is  evideneed  by  projeefs  fhaf  exfend  fhe  eonversion  rule  in  Coq,  sueh  as  Blanqui  ef  al.  [1999]  and  Sfrub 
[2010].  Even  if  user  exfensions  are  supporfed,  fhose  only  fake  fhe  form  of  firsf-order  fheories.  Can  we  do  beffer 
fhan  fhis,  enabling  arbifrarily  eomplex  user  exfensions,  wriffen  wifh  fhe  full  power  of  ML,  yef  mainfaining 
soundness? 

If  furns  ouf  fhaf  we  ean:  fhis  is  fhe  subjeef  of  fhis  seefion.  The  key  idea  is  fo  reeognize  fhaf  fhe  eonversion 
rule  is  essenfially  a  faefie,  embedded  wifhin  fhe  fype  eheeker  of  fhe  logie.  Calls  fo  fhis  faefie  are  made  implieifly 
as  parf  of  eheeking  a  given  proof  objeef  for  validify.  So  how  ean  we  supporf  a  flexible,  exfensible  alfernafive? 
Insfead  of  hardeoding  a  eonversion  faefie  wifhin  fhe  logie  fype  eheeker,  we  ean  program  a  fype-safe  version  of 
fhe  same  faefie  wifhin  VeriML,  wifh  fhe  requiremenf  fhaf  if  provides  proof  of  fhe  elaimed  equivalenee.  Insfead  of 
calling  fhe  conversion  faefie  as  parf  of  proof  checking,  we  use  sfaging  fo  call  fhe  faefie  sfafically  -  affer  (VeriML) 
fype  checking,  buf  before  runtime  execufion.  This  can  be  viewed  as  a  second,  pofenfially  non-ferminafing  proof 
checking  sfage.  Users  are  now  free  fo  wrife  fheir  own  conversion  facfics,  exfending  fhe  sfafic  checking  available 
for  proof  objecfs  and  proof  scripfs.  Sfill,  soundness  is  mainfained,  since  full  proof  objecfs  in  fhe  original  logic 
can  always  be  consfrucfed.  As  an  example,  we  have  exfended  fhe  conversion  rule  fhaf  we  use  by  a  congruence 
closure  procedure,  which  makes  use  of  mufable  dafa  sfrucfures,  and  by  an  arifhmefic  simplification  procedure. 

4.1  Introducing:  the  conversion  rule 

Lirsf,  lef  us  presenf  whaf  fhe  conversion  rule  really  is  in  more  defail.  We  will  base  our  discussion  on  a  simple 
fype-fheorefic  higher-order  logic,  based  on  fhe  XHOL  logic  as  described  in  Barendregf  and  Geuvers  [1999],  and 
used  in  our  original  work  on  VeriML  [Sfampoulis  and  Shao  2010].  We  can  fhink  of  such  a  logic  composed 
by  fhe  following  broad  classes:  fhe  objecfs  of  fhe  domain  of  discourse  d,  which  are  fhe  objecfs  fhaf  fhe  logic 
reasons  abouf,  such  as  nafural  numbers  and  lisfs;  fheir  classifiers,  fhe  kinds  %  (classified  in  furn  by  sorfs  s)',  fhe 
propositions  L;  and  fhe  derivations,  which  prove  fhaf  a  cerfain  proposifion  is  frue.  We  can  represenf  derivafions  in 
a  linear  form  as  ferms  7t  in  a  fyped  lambda-calculus;  we  call  such  ferms  proof  objecfs,  and  fheir  fypes  represenf 
proposifions  in  fhe  logic.  Checking  whefher  a  derivafion  is  a  valid  proof  of  a  cerfain  proposifion  amounfs  fo 
fype-checking  ifs  corresponding  proof  objeef.  Some  defails  of  fhis  logic  are  presenfed  in  Lig.  6;  fhe  inferesfed 
reader  can  find  more  information  abouf  if  in  fhe  above  references  and  in  fhe  appendix  (Sec.  A). 

In  Lig.  6,  we  show  whaf  fhe  conversion  rule  looks  like  for  fhis  logic:  if  is  a  fyping  judgemenf  fhaf  effeefively 
idenfifies  proposifions  up  fo  an  equivalence  relafion,  wifh  respeef  fo  checking  proof  objecfs.  We  call  fhis  version 
of  fhe  logic  XHOLc  and  use  he  to  denofe  ifs  enfailmenf  relafion.  The  equivalence  relafion  we  consider  in  fhe 
conversion  rule  is  evaluafion  up  fo  P-reduefions  and  uses  of  primitive  recursion  of  nafural  numbers,  denofed 
as  natElim.  In  fhis  way,  frivial  argumenfs  based  on  fhis  notion  of  compufafion  alone  need  nof  be  wifnessed,  as 
for  example  is  fhe  facf  fhaf  (Succ  x)  +  y  =  Succ  (x  +  y)  -  when  fhe  addition  function  is  defined  by  primitive 
recursion  on  fhe  firsf  argumenf.  Of  course,  fhis  is  only  a  very  basic  use  of  fhe  conversion  rule.  If  is  possible  fo 
omif  larger  proofs  fhrough  much  more  sophisficafed  uses.  This  leads  fo  simpler  proofs  and  smaller  proof  objecfs. 

Sfill,  when  using  fhis  approach,  fhe  choice  of  whaf  relafion  is  supporfed  by  fhe  conversion  rule  needs  fo  be 
made  during  fhe  definifion  of  fhe  logic.  This  choice  permeafes  all  aspeefs  of  fhe  mefafheory  of  fhe  logic.  If  is 
easy  fo  see  why,  even  wifh  fhe  finy  fragmenf  of  logic  we  have  infroduced.  Mosf  typing  rules  for  proof  objecfs  in 
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^\-ed\=d2'-  Prop 


'F;  <I>  h,  J  :  % 

*F;  <I>  \-e  refi  d  :  d  =  d 


'P;  <I>,  X  :  OC  h,  P  :  Prop  'F;  <I)  h,  :  3C  'F;  <I>  h,  7i :  P[Ji/x]  'P;  <I>  h,  7i' :  =  J2 

*P;  he  leibniz  (Xx  :  %.P)  n  n' :  P[d2/x] 

*P;  <I>,  X  :  3CI-e  71 :  =  (i2 

*P;  he  lamEq  (Xx  :  %.n)  :  (Xx  :  %.di)  =  (Xx  :  X.d2) 

'p-,  x:  %  he'll  :di  =d2  *P;  <I>  hg  :  Prop 

*P;  <I>  he  forallEq  (Xx  :  JC-k)  :  (Vx  :  X.di)  =  (Vx  :  X.d2) 

'¥-,<^,x:Xhed:X'  '¥-<^hed':X 

*P;  <I>  he  betaEq  (Xx  :  X.d)  d' :  (Xx  :  X.d)  d'  =  d[d' /x] 

Axioms  assumed: 

natElimBasooc  :  V/j..V/,.natElim3c  /,  zero  = 
natElimStepjc  :  yf^h/fshfri.  natElimjc /z  A  (succ  n)  = 

A  n  (natElimac  f,  f,  n) 


Figure  8.  Extending  XHOL  with  explieit  equality  (XHOLg) 


the  logie  are  similar  to  the  rules  — )-lNTRO  and  — )-Elim:  they  are  syntax-direeted.  This  means  that  upon  seeing 
the  assoeiated  proof  objeet  eonstruetor,  like  Xx  :  P.Tl  in  the  ease  of  — )-lNTRO,  we  ean  direetly  tell  that  it  applies. 
If  all  rules  were  syntax  direeted,  it  would  be  entirely  simple  to  prove  that  the  logie  is  sound  by  an  induetive 
argument:  essentially,  sinee  no  proof  eonstruetor  for  False  exists,  there  is  no  valid  derivation  for  False. 

In  this  logie,  the  only  rule  that  is  not  syntax  direeted  is  exaetly  the  eonversion  rule.  Therefore,  in  order  to 
prove  the  soundness  of  the  logie,  we  have  to  show  that  the  eonversion  rule  does  not  somehow  introduee  a  proof 
of  False.  This  means  that  proving  the  soundness  of  the  logie  passes  essentially  through  the  speeifie  relation  we 
have  ehosen  for  the  eonversion  rule.  Therefore,  this  approaeh  is  foundationally  limited  from  supporting  user 
extensions,  sinee  any  new  extension  would  require  a  new  metatheoretie  result  in  order  to  make  sure  that  it  does 
not  violate  logieal  soundness. 

4.2  Throwing  conversion  away 

Sinee  having  a  fixed  eonversion  rule  is  bound  to  fail  if  we  want  it  to  be  extensible,  what  ehoiee  are  we  left  with, 
but  to  throw  it  away?  This  radieal  sounding  approaeh  is  what  we  will  do  here.  We  ean  replaee  the  eonversion 
rule  by  an  explieit  notion  of  equality,  and  provide  explieit  proof  witnesses  for  rewriting  based  on  that  equality. 
Essentially,  all  the  points  where  the  eonversion  rule  was  alluded  to  and  proofs  were  omitted,  need  now  be 
replaeed  by  proof  objeets  witnessing  the  equivalenee.  Some  details  for  the  additions  required  to  the  base  XHOE 
logie  are  shown  in  Eig.  8,  yielding  the  XHOEg  logie.  There  are  good  reasons  for  ehoosing  this  version:  first,  the 
proof  eheeker  is  as  simple  as  possible,  and  does  not  need  to  inelude  the  eonversion  eheeking  routine.  We  eould 
view  this  routine  as  performing  proof  seareh  over  the  replaeement  rules,  so  it  neeessarily  is  more  eomplieated, 
espeeially  sinee  it  needs  to  be  relatively  effieient.  Also,  the  metatheory  of  the  logie  itself  ean  be  simplified.  Even 
when  fhe  eonversion  rule  is  supporfed,  fhe  mefafheory  for  fhe  assoeiafed  logie  is  proved  fhrough  fhe  explieif 
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PNequal :  ((|) :  ctx,r  :  Type,ri  :  T,t2  :  T)  option  LT(fi  =12) 
PNequal  tit2  = 

holcase  whnf  (|)  r  ti,  whnf  (|)  r  t2  of 

{{ta  -T'  ^  T)  th),  {tc  td) 

do  (pfi)  ^  PNequal  (|)  (r' — >  r)  fa  4 

(pfi)  ^  PNequal  (|)  r' f/,  frf 
return  (•  •  •  proof  of  tatb  =  tctd  ■■■) 

I  {ta  tb),  {tc  — f  td)  I— f 

do  (pfi)  ^  PNequal  (|)  Prop  fa  fc 
(pfi)  ^  PNequal  (|)  Prop  fi,  frf 
return  (•  •  •  proof  of  ta  ^  tb  =  tc  ^  td  ■  ■  ■) 

I  {Xx  :  T.t\),{'kx  :  T.t2)  ^ 

do  (pf)  ^  PNequal  [(|),  ;ic :  T]  Prop  fi  f2 
return  (•  •  •  proof  of  he :  T.t\  =  hx  :  T.t2  ■■■) 

I  fi,fi  1-^  do  return  (•  •  •  proof  of  fi  =  fi  •  •  •) 

I  fi,f2  1-^  None 

requireEqual :  ((|) :  ctx,r  :  Type,fi  :  r,f2  :  r).LT(fi  =  12) 
requireEqual  (])  r  fi  f2  = 

match  PNequal  tit2  with  Some;ic  1-^  x  |  None  i->  error 


Figure  9.  VeriML  tactic  for  checking  equality  up  to  P-conversion 

equality  approach;  this  is  because  model  construction  for  a  logic  benefits  from  using  explicit  equality  [Siles  and 
Herbelin  2010]. 

Still,  this  approach  has  a  big  disadvantage:  the  proof  objects  soon  become  extremely  large,  since  they  include 
painstakingly  detailed  proofs  for  even  the  simplest  of  equivalences.  This  precludes  their  use  as  independently 
checkable  proof  certificates  that  can  be  sent  to  a  third  party.  It  is  possible  that  this  is  one  of  the  reasons  why 
systems  based  on  logics  with  explicit  equalities,  such  as  HOL4  [Slind  and  Norrish  2008]  and  Isabelle/HOL 
[Nipkow  et  al.  2002],  do  not  generate  proof  objects  by  default. 

4.3  Getting  conversion  back 

We  will  now  see  how  it  is  possible  to  reconcile  the  explicit  equality  based  approach  with  the  conversion  rule:  we 
will  gain  the  conversion  rule  back,  albeit  it  will  remain  completely  outside  the  logic.  Therefore  we  will  be  free 
to  extend  it,  all  the  while  without  risking  introducing  unsoundness  in  the  logic,  since  the  logic  remains  fixed 
(XHOLg  as  presenfed  above). 

We  do  fhis  by  revisifing  fhe  view  of  fhe  conversion  rule  as  a  special  “frusfed”  facfic,  fhrough  fhe  fools 
presenfed  in  fhe  previous  seefion.  Firsf,  insfead  of  hardcoding  a  conversion  facfic  in  fhe  fype  checker,  we  program 
a  type-safe  conversion  tactic,  utilizing  fhe  feafures  of  VeriML.  Based  on  typing  alone  we  require  fhaf  if  refurns 
a  valid  proof  of  fhe  claimed  equivalences: 

PNequal :  ((|) :  ctx,  T  :  Type,  t :  T,  t' :  T)  ^  option  LT(t  =  t') 

Second,  we  evaluate  fhis  lactic  under  proof  erasure  semantics.  This  means  lhal  no  proof  objecls  are  produced, 
leading  fo  fhe  same  space  gains  as  fhe  original  conversion  rule.  Third,  we  use  fhe  sfaging  consfrucf  in  order  lo 
check  conversion  statically. 
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whnf :  ((|) :  ctx,r  :  Type,f :  T)  — >  {t' :  T)  x  LT(f  =  t') 
whnf  (|)  r  f  =  holcase  t  of 

{h-.r  ^T){t2:r)^ 

let  (fj,  pf\)  =  whnf  (|)  (r'  — ^  r)  fi  in 
holcase  t[  of 

hc'.T'.lf  1-^  ([(|)]f//[id(i),f2],---) 

\t[  1-^ 

I  natElimjc  fzfsn^ 

let  {n',  pfi)  =  whnf  (|)  Nat  n  in  holcase  n'  of 
zero  ^  {W]fz,---) 

Isuccn'  ([(])]/,  «' (natElimjc/j./^  «')>•••) 

\n'  ^  ([(])]  natElimjc/z/^  «',•••) 

\t  ^  {t,  •  •  •) 


Figure  10.  VeriML  tactic  for  rewriting  to  weak  head-normal  form 


Details.  We  now  present  our  approaeh  in  more  detail.  First,  in  Fig.  9,  we  show  a  sketeh  of  the  eode  behind  the 
type-safe  eonversion  eheek  taetie.  It  works  by  first  rewriting  its  input  terms  into  weak  head-normal  form,  via  the 
whnf  funetion  in  Fig.  10,  and  then  reeursively  cheeking  their  subterms  for  equality.  In  the  equivalence  checking 
function,  more  cases  are  needed  to  deal  with  quantification;  while  in  the  rewriting  procedure,  a  recursive  call 
is  missing,  which  would  complicate  our  presentation  here.  We  also  define  a  version  of  the  tactic  that  raises  an 
error  instead  of  returning  an  option  type  if  we  fail  to  prove  the  terms  equal,  which  we  call  requIreEqual.  The  full 
details  can  be  found  in  our  implementation. 

The  code  of  the  PNequal  tactic  is  in  fact  entirely  similar  to  the  code  one  would  write  for  the  conversion  check 
routine  inside  a  logic  type  checker,  save  for  the  extra  types  and  proof  objects.  It  therefore  follows  trivially  that 
everything  that  holds  for  the  standard  implementation  of  the  conversion  check  also  holds  for  this  code:  e.g.  it 
corresponds  exactly  to  the  =pf^  relation  as  defined  in  fhe  logic;  if  is  bound  fo  ferminafe  because  of  fhe  sfrong 
normalizafion  fheorem  for  Ibis  relafion;  and  ifs  proof-erased  version  is  af  leasf  as  fruslworfhy  as  fhe  sfandard 
implemenfafion. 

Furfhermore,  given  fhis  code,  we  can  produce  a  form  of  typed  proof  scripts  inside  VeriML  fhaf  correspond 
exacfly  fo  proof  objecfs  in  fhe  logic  wifh  fhe  conversion  rule,  bofh  in  terms  of  fheir  acfual  code,  and  in  ferms  of 
fhe  sfeps  required  fo  validate  fhem.  This  is  done  by  consfrucfing  a  proof  scripl  in  VeriML  by  induction  on  fhe 
derivation  of  fhe  proof  objecf  in  XHOLc,  replacing  each  proof  objecf  consfrucfor  by  an  equivalenf  VeriML  facfic 
as  follows: 


consfrucfor 

fo  facfic 

of  fype 

he :  P.n 

Assume  e 

LT([^,//:F]PO^LT(F^F') 

Til  712 

Apply  ei  e2 

LT(P  ^  P')  LT(P)  ^  LT(P') 

he :  X.n 

Intro  e 

LT([(|),  X  :  T]P')  LT(Vx  :  T,P') 

nd 

Inst  e  a 

LT(Vx :  r,P)  -^{a:T)-^ 

LT(P/[ld,  a]) 

c 

Lift  c 

{H:P)^  LT(P) 

(conversion) 

Conversion 

LT(P)  ^  LT(P  =  P')  LT(P') 

Here  we  have  omitted  fhe  currenf  logical  environmenf  (|);  if  is  mainfained  fhrough  synfaclic  means  as  discussed 
in  Sec.  7  and  fhrough  fype  inference.  The  only  sublie  case  is  conversion.  Given  fhe  Iransformed  proof  e  for  fhe 
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proof  object  n  contained  within  a  use  of  the  conversion  rule,  we  call  the  conversion  tactic  as  follows: 


letstatic  pf  =  requireEqual  P  P'  in  Conversion  e  pf 

The  arguments  to  requireEquai  can  be  easily  inferred,  making  crucial  use  of  the  rich  type  information  available. 
Conversion  could  also  be  used  implicitly  in  the  other  tactics.  Thus  the  resulting  expression  looks  entirely 
identical  to  the  original  proof  object. 

Correspondence  with  original  proof  object.  In  order  to  elucidate  the  correspondence  between  the  resulting 
proof  script  expression  and  the  original  proof  object,  it  is  fruitful  to  view  the  proof  script  as  a  proof  certificate, 
sent  to  a  third  party.  The  steps  required  to  check  whether  it  constitutes  a  valid  proof  are  the  following.  First, 
the  whole  expression  is  checked  using  the  type  checker  of  the  computational  language.  Then,  the  calls  to  the 
requireEqual  function  are  evaluated  during  stage  one,  using  proof  erasure  semantics.  We  expect  them  to  be 
successful,  just  as  we  would  expect  the  conversion  rule  to  be  applicable  when  it  is  used.  Last,  the  rest  of  the 
tactics  are  evaluated;  by  a  simple  argument,  based  on  the  fact  that  they  do  not  use  pattern  matching  or  side- 
effects,  they  are  guaranteed  to  terminate  and  produce  a  proof  object  in  XHOL^.  This  validity  check  is  entirely 
equivalent  to  the  behavior  of  type-checking  the  XHOLc  proof  object,  save  for  pushing  all  conversion  checks 
towards  the  end. 

4.4  Extending  conversion  at  will 

In  our  treatment  of  the  conversion  rule  we  have  so  far  focused  on  regaining  the  PN  conversion  in  our  framework. 
Still,  there  is  nothing  confining  us  fo  supporfing  fhis  conversion  check  only.  As  long  as  we  can  program  a 
conversion  facfic  in  VeriML  fhaf  has  fhe  righf  fype,  if  can  safely  be  made  pari  of  our  conversion  rule. 

For  example,  we  have  wriflen  an  euf Equal  funclion,  which  checks  terms  for  equivalence  based  on  the  equality 
with  uninterpreted  functions  decision  procedure.  It  is  adapted  from  our  previous  work  on  VeriML  [Stampoulis 
and  Shao  2010].  This  equivalence  checking  tactic  isolates  hypotheses  of  the  form  di  =  d2  from  the  current 
context,  using  the  newly-introduced  context  matching  support.  Then,  it  constructs  a  union-find  dala  slruclure  in 
order  lo  form  equivalence  classes  of  terms.  Based  on  this  structure,  and  using  code  similar  to  PNequal  (recursive 
calls  on  subterms),  we  can  decide  whether  two  terms  are  equal  up  to  simple  uses  of  the  equality  hypotheses  at 
hand.  We  have  combined  this  tactic  with  the  original  PNequal  tactic,  making  the  implicit  equivalence  supported 
similar  to  the  one  in  the  Calculus  of  Congruent  Constructions  [Blanqui  et  al.  2005].  This  demonstrates  the 
flexibility  of  this  approach:  equivalence  checking  is  extended  with  a  sophisticated  decision  procedure,  which  is 
programmed  using  its  original,  imperative  formulation.  We  have  programmed  both  the  rewriting  procedure  and 
the  equality  checking  procedure  in  an  extensible  manner,  so  that  we  can  globally  register  further  extensions. 

4.5  Typed  proof  scripts  as  certificates 

Earlier  we  discussed  how  we  can  validate  the  proof  scripts  resulting  from  turning  the  conversion  rule  into 
explicit  tactic  calls.  This  discussion  shows  an  interesting  aspect  of  typed  proof  scripts:  they  can  be  viewed  as 
a  proof  witness  that  is  a  flexible  compromise  befween  untyped  proof  scripfs  and  proof  objecfs.  When  a  fyped 
proof  scripl  consisfs  only  of  sialic  calls  lo  conversion  laclics  and  uses  of  lolal  laclics,  if  can  be  Ihoughl  of  as  a 
proof  objecf  in  a  logic  wilh  fhe  corresponding  conversion  rule.  When  if  also  confains  olher  laclics,  lhal  perform 
polenlially  expensive  proof  search,  if  corresponds  more  closely  lo  an  unlyped  proof  scripl,  since  il  needs  lo  be 
fully  evaluated.  Slill,  we  are  allowed  lo  validate  parls  of  il  slalically.  This  is  especially  useful  when  developing 
Ihe  proof  scripl,  because  we  can  avoid  Ihe  evalualion  of  expensive  laclic  calls  while  we  focus  on  gelling  Ihe 
skeleton  of  Ihe  proof  correct 

Using  proof  erasure  for  evalualing  requireEqual  is  only  one  of  Ihe  choices  Ihe  receiver  of  such  a  proof 
cerlilicate  can  make.  Anolher  choice  would  be  to  have  Ihe  funclion  relurn  an  aclual  proof  object  which  we 
can  check  using  Ihe  XHOLg  type  checker.  In  lhal  case,  Ihe  VeriML  interpreter  does  nol  need  to  become  pari  of 
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the  trusted  base  of  the  system.  Last,  the  ‘safest  possible’  ehoiee  would  be  to  avoid  doing  any  evaluation  of  the 
funetion,  and  ask  the  proof  eertifieate  provider  to  do  the  evaluation  of  requireEqual  themselves.  In  that  ease,  no 
evaluation  of  eomputational  eode  would  need  to  happen  at  the  proof  eertifieate  reeeiver’s  side.  This  mitigates 
any  eoneerns  one  might  have  for  eode  exeeution  as  part  of  proof  validity  eheeking,  and  guarantees  that  the 
small  XHOLg  type  eheeker  is  the  trusted  base  in  its  entirety.  Also,  the  reeeiver  ean  deeide  on  the  above  ehoiees 
seleetively  for  different  eonversion  taeties  -  e.g.  use  proof  erasure  for  PNequal  but  not  for  eufEqual,  leading  to  a 
trusted  base  identieal  to  the  XHOLc  ease.  This  means  that  the  ehoiee  of  the  eonversion  rule  rests  with  the  proof 
eertifieate  reeeiver  and  not  with  the  designer  of  the  logie.  Thus  the  proof  eertifieate  reeeiver  ean  ehoose  the  level 
of  trust  they  require  at  will. 

5.  Static  proof  scripts 

In  the  previous  seetion,  we  have  demonstrated  how  proof  eheeking  for  typed  proof  seripts  ean  be  made  user- 
extensible,  through  a  new  treatment  of  the  eonversion  rule.  It  makes  use  of  user-defined,  fype-safe  faefies,  whieh 
are  evaluafed  sfafieally.  The  quesfion  fhaf  remains  is  whaf  happens  wifh  respeef  fo  proofs  wifhin  faefies.  If  a 
proof  seripf  is  found  wifhin  a  faefie,  musf  we  waif  unfil  fhaf  evaluafion  poinf  is  reaehed  fo  know  whefher  fhe 
proof  seripf  is  eorreef  or  nof?  Or  is  fhere  a  way  fo  eheek  fhis  sfafieally,  as  soon  as  fhe  faefie  is  defined? 

In  fhis  seefion  we  show  how  fhis  is  possible  fo  do  in  VeriML  using  fhe  sfaging  eonsfruef  we  have  infrodueed. 
Still,  in  fhis  ease  mailers  are  nof  as  simple  as  evalualing  eerfain  expressions  slalieally  ralher  lhan  dynamieally. 
The  reason  is  fhaf  proof  seripfs  eonlained  wifhin  faefies  menlion  uninslanlialed  mela-variables,  and  Ihus  eannol 
be  evaluafed  Ihrough  sfaging.  We  resolve  fhis  by  showing  fhe  exislenee  of  a  Iransformalion,  whieh  “eollapses” 
logieal  terms  from  an  arbifrary  mefa-variables  eonfexl  info  fhe  emply  one. 

We  will  foeus  on  fhe  ease  of  developing  eonversion  roulines,  similar  fo  fhe  ones  we  saw  earlier.  The  ideas  we 
presenl  are  generally  applieable  when  wriling  olher  types  of  faefies  as  well;  we  foeus  on  eonversion  roulines  in 
order  fo  demonslrale  fhaf  fhe  Iwo  main  ideas  we  presenl  in  fhis  paper  ean  work  in  fandem. 

A  rewriter  for  plus.  We  will  eonsider  fhe  ease  of  writing  a  rewriler  -similar  fo  whnf-  for  simplifying 
expressions  of  fhe  formx-|-y,  depending  on  fhe  seeond  argumenf.  The  addilion  funelion  is  defined  by  induelion 
on  fhe  firsl  argumenf,  as  follows: 


(-I-)  =  Xv.Xy.nafElimNaty  (Xp.Xr.Succ  r)  x 

In  order  for  rewrilers  fo  be  able  fo  use  existing  as  well  as  fulure  rewrilers  fo  perform  Iheir  reeursive  ealls,  we 
wrife  Ihem  in  fhe  open  reeursion  slyle  -  Ihey  reeeive  a  funelion  of  fhe  same  type  fhaf  eorresponds  fo  fhe  “eurrenl” 
rewriler.  The  eode  looks  as  follows: 

rewriferType  =  ((|) :  ctx,r  :  Type,f :  T)  — ^  (t' :  T)  x  LT(t  =  T) 
plusRewriterf  :  rewriferType  — >  rewriferType 
piusRewriterf  recursive  t  =  hoicase  t  with 

x+y^ 

iet  (y',  (pfy'))  =  recursive  (|)y  in 
let  (?',  (pft'))  = 

hoicase  y'  return  ZT  :  [(])]  Nat.LT([(|)]x-|-y'  =  t')  of 
0  I— ;■  (x,  •  •  •  proof  of  X  -|-  0  =  X  •  •  • ) 

I  Succy'  1-^  <^Succ(x  +/), 

•  •  •  proof  of  X  +  Succ  y'  =  Succ  {x  +  y')  •  •  •  ^ 

I  y  1-^  (x-l-y',---  proof  of  x+y'  =x+y  •••) 
in(t',  {■■■  proof  of  x+y  =  t'  ■■■)) 

\t  I— )■  (f,  •  •  •  proof  of  t  =t  ■  ■■) 
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While  developing  such  a  tactic,  we  can  leverage  the  VeriML  type  checker  to  know  the  types  of  missing 
proofs.  But  how  do  we  fill  them  in?  For  the  interesting  cases  of  x  +  0  =  x  and  x  +  Succ  y'  =  Succ  (x  +  y'), 
we  would  certainly  need  to  prove  the  corresponding  lemmas.  But  for  the  rest  of  the  cases,  the  corresponding 
lemmas  would  be  uninteresting  and  tedious  to  state,  such  as  the  following  for  the  x  +  y  =  t'  case: 

lemmaf  :  yx,y,y',t' ,y  =y'  ^  {x+y  =  t')  ^  x+y  =  t 

Stating  and  proving  such  lemmas  soon  becomes  a  hindrance  when  writing  tactics.  An  alternative  is  to  use  the 
congruence  closure  conversion  rule  to  solve  this  trivial  obligation  for  us  directly  at  the  point  where  it  is  required. 
Our  first  attempt  would  be: 

proof  ofx  +  y  =  t'  = 

let  (pf)  =  requireEqual  -.y  =  y' ,H2'.  x  +  y'  =  t']  (x+y)  t' 

in  ([(])]  pf/[id<^,  pf/,  pft’]> 

The  benefit  of  this  approach  is  evident  when  utilizing  implicit  arguments,  since  most  of  the  details  can  be 
inferred  and  therefore  omitted.  Here  we  had  to  alter  the  environment  passed  to  requireEqual,  which  includes 
several  extra  hypotheses.  Once  the  resulting  proof  has  been  computed,  the  hypotheses  are  substituted  by  the 
actual  proofs  that  we  have. 

The  problem  with  this  approach  is  two-fold:  first,  the  call  to  the  requireEqual  tactic  is  recomputed  every  time 
we  reach  that  point  of  our  function.  For  such  a  simple  tactic  call,  this  does  not  impact  the  runtime  significantly; 
still,  if  we  could  avoid  it,  we  would  be  able  use  more  sophisticated  and  expensive  tactics.  The  second  problem 
is  that  if  for  some  reason  the  requireEqual  is  not  able  to  prove  what  it  is  supposed  to,  we  will  not  know  until  we 
actually  reach  that  point  in  the  function. 

Moving  to  static  proofs.  This  is  where  using  the  letstatic  construct  becomes  essential.  We  can  evaluate  the 
call  to  requireEqual  statically,  during  stage  one  interpretation.  Thus  we  will  know  at  the  time  that  plusRewritert 
is  defined  whether  the  call  succeeded;  also,  it  will  be  replaced  by  a  concrete  value,  so  it  will  not  affect  the 
runtime  behavior  of  each  invocation  of  plusRewritert  anymore.  To  do  that,  we  need  to  avoid  mentioning  any 
of  the  metavariables  that  are  bound  during  runtime,  like  x,  y,  and  t'.  This  is  done  by  specifying  an  appropriate 
environment  in  the  call  to  requireEqual,  similarly  to  the  way  we  incorporated  the  extra  knowledge  above  and 
substituted  it  later.  Using  this  approach,  we  have: 

proof  of  X  +  y  =  t'  = 
letstatic  (pf)  = 

letf  =  [x,y,y',t' :  Nat,//i  :y  =  y',H2  :x+y'  =  t']  in 
requireEqual  (|)'  (x+y)  t' 

in  ([(])] pf/[x/id,^,y/id<^,y7id<|„tVid0,pfy7id4„pft7id/) 

What  we  are  essentially  doing  here  is  replacing  the  meta-variables  by  normal  logical  variables,  which  our 
tactics  can  deal  with.  The  meta-variable  context  is  “collapsed”  into  a  normal  context;  proofs  are  constructed 
using  tactics  in  this  environment;  last,  the  resulting  proofs  are  transported  back  into  the  desired  context  by 
substituting  meta-variables  for  variables.  We  have  explicitly  stated  the  substitutions  in  order  to  distinguish 
between  normal  logical  variables  and  meta-variables. 

The  reason  why  this  transformation  needs  to  be  done  is  that  functions  in  our  computational  language  can  only 
manipulate  logical  terms  that  are  open  with  respect  to  a  normal  variables  context;  not  logical  terms  that  are  open 
with  respect  to  the  meta-variables  context  too.  A  much  more  complicated,  but  also  more  flexible  alternative  to 
using  this  “collapsing”  trick  would  be  to  support  meta-n-variables  within  our  computational  language  directly. 

Overall,  this  approach  is  entirely  similar  to  proving  the  auxiliary  lemma  mentioned  above,  prior  to  the  tactic 
definition.  The  benefit  is  that  by  leveraging  the  type  information  together  with  type  inference,  we  can  avoid 
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stating  such  lemmas  explicitly,  while  retaining  the  same  runtime  behavior.  We  thus  end  up  with  very  concise 
proof  expressions  that  are  statically  validated.  We  introduce  syntactic  sugar  for  binding  a  static  proof  script 
to  a  variable,  and  then  performing  a  substitution  to  bring  it  into  the  current  context,  since  this  is  a  common 
operation. 


(^')static  =  letstatic  (pf)  =  ein  ([^]pf/---) 

Based  on  these,  the  trivial  proofs  in  the  above  tactic  can  be  filled  in  using  a  simple  (requireEqual)j,jjjjj^  call;  for 
the  other  two  we  use  (Instantiate  (Natinduction  requireEqual  requireEqual) 

After  we  define  plusRewriterf ,  we  can  regisfer  if  wifh  fhe  global  equivalence  checking  procedure.  Thus,  all 
lafer  calls  fo  requireEqual  will  benefil  from  Ibis  simplificalion.  If  is  fhen  simple  fo  prove  commufafivify  for 
addition: 

plusComm  :  LT(Vx,y.x+y  =  y+x) 

plusComm  =  Natinduction  requireEqual  requireEqual 

Based  on  this  proof,  we  can  write  a  rewriter  that  takes  commutativity  into  account  and  uses  the  hash  values 
of  logical  terms  to  avoid  infinite  loops.  We  have  worked  on  an  arithmetic  simplification  rewriter  that  is  built  by 
layering  such  rewriters  together,  using  previous  ones  to  aid  us  in  constructing  the  proofs  required  in  later  ones. 
It  works  by  converting  expressions  into  a  list  of  monomials,  sorting  the  list  based  on  the  hash  values  of  the 
variables,  and  then  factoring  monomials  on  the  same  variable.  Also,  the  eufEqual  procedure  mentioned  earlier 
has  all  of  its  associated  proofs  automated  through  static  proof  scripts,  using  a  naive,  potentially  non-terminating, 
equality  rewriter. 

Is  collapsing  always  possible?  A  natural  question  to  ask  is  whether  collapsing  the  metavariables  context  into 
a  normal  context  is  always  possible.  In  order  to  cast  this  as  a  more  formal  question,  we  notice  that  the  essential 
step  is  replacing  a  proof  object  n  of  type  [<I>]  t,  typed  under  the  meta-variables  environment  by  a  proof  object 
n'  of  type  [<!>']  t'  typed  under  the  empty  meta-variables  environment.  There  needs  to  be  a  substitution  so  that  7t' 
gets  transported  back  to  the  <I>,  environment,  and  has  the  appropriate  type. 

We  have  proved  that  this  is  possible  under  certain  restrictions:  the  types  of  the  metavariables  in  the  current 
context  need  to  depend  on  the  same  free  variables  context  or  prefixes  of  fhaf  confexf.  Also  fhe  subsfifufions 
fhey  are  used  wifh  need  fo  be  prefixes  of  fhe  idenfify  subsfifufion  for  Such  ferms  are  characferized 

as  collapsible.  We  have  proved  fhaf  collapsible  ferms  can  be  replaced  using  terms  fhaf  do  nol  make  use  of 
mefavariables;  more  defails  can  be  found  in  Sec.  6  and  in  Sec.  F  of  fhe  appendix. 

This  resfricfion  corresponds  very  well  fo  fhe  freafmenf  of  variable  conlexfs  in  fhe  Delphin  language.  This 
language  assumes  an  ambienf  confexf  of  logical  variables,  insfead  of  full,  confexfual  modal  ferms.  Consfrucfs 
fo  extend  fhis  confexf  and  subsfifufe  a  specific  variable  exisf.  If  Ibis  lasf  fealure  is  nol  used,  fhe  ambienf  confexf 
grows  monolonically  and  fhe  mentioned  resfricfion  holds  frivially.  In  our  fesfs,  fhis  resfricfion  has  nol  fumed  oul 
fo  be  limiting. 

6.  Metatheory 

We  have  completed  an  exfensive  reworking  of  fhe  melalheory  of  VeriML,  in  order  fo  incorporate  fhe  fealures 
fhaf  we  have  presented  in  Ibis  paper.  Our  new  melalheory  includes  a  number  of  technical  advances  compared 
fo  our  earlier  work  [Slampoulis  and  Shao  2010].  We  will  presenl  a  technical  overview  of  our  melalheory  in  fhis 
section;  full  defails  can  be  found  in  fhe  appendix. 

Variable  representation  technique.  Though  our  melalheory  is  done  on  paper,  we  have  found  fhaf  using  a 
concrete  variable  represenlalion  technique  elucidales  some  aspecls  of  how  differenl  kinds  of  subslilufions  work 
in  our  language,  compared  fo  having  normal  named  variables.  For  example,  inslanlialing  a  confexf  variable  wifh 
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Syntax  of  the  logic 


(terms)  t  :■=  s  \  c  \  fi  \  bi  \  X(t\).t2  \t{ti  \  Yl(ti).t2  |  =  t2  I  refi  t  \  leibniz  t\  t2  \  lamEq  t  \  forallEq  t\  t2  \  betaEq  ti  t2 

(sorts)  s  Prop  |  Type  |  Type'  (var.  context)  <I>  ::=  •  |  <I>,  r  (substitutions)  a  ::=  *10,? 


Example  of  representation:  a  :  Nat  h  ^  :  Nat.(X>’ :  Nat.refI  (plus  ay)) (plus  a  x)  Nat  h  X(Nat).(X(Nat).refl  (plus/o  bo))  (plus  fo  bo) 


Freshen:  [f]" 


r/i-i 

Ml 

IbiT 

\(Kh)-t2)T  = 

\t\  tzl  = 


f. 

fm 

bj  when  i  <n 

x{\hr).\hM^ 


Bind:  [rj" 


L/ni- 1 J  m 
[fi\l 

\bi\ 

nm-M  = 

[tl  t2\  = 


=  b„ 


fi  when  i  <m  — 
bi+i 

L»iJ  Lf2j 


(a)  Hybrid  deBruijn  levels-deBruijn  indices  representation  technique 


Syntax 


/l  I  Xj/a  ::=  •  |  <I>,  r  |  <I>,  (]),■  o  ::=  •  |  c,  r  |  o,  id((|),)  (indices)  I  ::=  n  |  1+  |4>i 
(ctx.kinds)  K  ::=  [<!>](  |  [Ojctx  (extension  context)  'E  ::=  •  |  'E,  A" 


(ctx.  terms)  T 
(ext.  subst.)  Oip 


[<I']r  I  [<I>]<I>' 

•  I  Of:  T 


'E;  O  h  r :  f'  (sample) 


<I>.I  =  r 
'E;  O  h  /l  :  r 


'B;  Oh  fi  :  n(r).f'  'I';<I>hf2:f 

*1';  O  h  fi  f2  :  \t')  ■  (id*, (2) 


vi'./=[0']f'  'E;  O  h  a  :  O' 

‘B;  O  h  Xi/a  -.t'  a 


'Bhr:/f 


'B;  O  h  r :  f' 
¥h  [0]f :  [0]t' 


‘B  h  O,  o'  wf 
'Bh  [0]0' :  [Ojctx 


“B  h  O  wf  (sample) 


'BhOwf  S^.!=  [Ojctx 
‘B  h  (O,  (j),)  wf 


(b)  Extension  variables:  meta-variables  and  context  variables 


Subst.  application:  t  ■  a 


C<5=  C 


/l  ■  O  =  0.1 


bjO  =  bi 


(X(ti).t2)-a  =  X(ti-a).{t2-<3) 


(ti  t2)-a  =  (ti  ■  a)  (t2  ■  o) 


Ext.  subst.  application  (sample) 


(I,  |(])i|)  •  Oip  =  (I- Ctj/),  |<I>' I  when  Oip./ =  [_]<!>' 
(o,  id((|),))-o.p  =  o-o.p,  ldo.j,,i 


(Xi/c)  ■  Oip  =  r  ■  (a  -  Oip)  when  a^/.i  =  [_]r 
(O,  i|)i)  ■  Oip  =  O  ■  Oip,  O'  when  Oip.;'  =  [_]0' 


O  h  o  :  O' 


Oh»:» 


‘B;  O  h  o  :  o' 

O  h  r :  f'  ■  o 
'B;  Oh  (o,  t)  :  (O',  t') 


O  h  o  :  o'  'V.i  =  [O']  ctx 
o',  (j),  C  O 

'B;Oh(o,  id((^,))  :  (<!>', 


_  ‘B  h  o>p  :  >B' 

'Bho.jc^B'  'Bh7’:A:o.j/ 
(selected)  Oh  (o^/,  T)  :  (‘B',  K) 


Subst.  lemmas: 


'B;Ohf:f'  'B;0'ho:0  O;  O' h  o :  O  O;  O"  h  o' :  O'  'Bhr:^:  O' h  Oq,  :  O 

O;  0'hfo:r'o  0;0"ho  o':0  O' h  T  ■  Oij/ :  ■  Oip 


(c)  Substitutions  over  logical  variables  and  extension  variables 


static 


Syntax: 

r  •  r,  X  :  X  r,  jc T  r,  ct : 

e  V.—  •  • 

■  letstatic  X  =  e  in  e' 

Limit  ctx: 

(1  ,  X  t )  static 
(r,  X  :  ? )  static 
(E,  Ot  :  static 

=  r 
=  r 

=  r 

static  5 
static 
static 


X :  t 


O;  E;  r  h  e  :  T  (part) 


•;  Z;  rjstatic  h  e  :  r  O;  Z;  r,x  T  h  c' :  l 
O;  Z;  r  h  letstatic  x  =  e  In  e' :  i 


X :  T  G  r 

O;  Z;  rhx:i: 


V  ::=  A{K).ej  \  pack  T  return  (.%)  with  v  |  ()  |  ^  :  r.Crf  |  (v,  v')  |  Inj,-  v  |  fold  v  |  /  |  Aa  :  U.c^i 

S  ::=  letstatic X  =  •  in  e' |  letstatic x  =  S  in  e' |  A(/T).S  |  ^  :  r.S  |  unpacked  (.)x.(S)  |  case(erf,  x.S,  x.e2) 

case(ej,x.ej,x.S)  \  Aa  :  k.S  |  fixx  :  T.S  |  unify  T  return  (.t)  with  (O.T'  kv  S)  |  £j[S] 

::=  £.,  T  \  pack  T  return  (.r)  with  £.,  |  unpack  £i  (.)x.(c')  |  £.,  e'  \  e^  £j  |  (£s,  e)  \  (ej,  t.,)  |  proj,-  £.,  |  inj,-  £., 

case(£j,  x.ei ,  x.e2)  |  fold  £j  |  unfold  £s  |  ref  £s  |  £,  :=  e'  I  e^  :=  £j  |  !£j  |  £s  r 

ej  ::=  all  of  e  except  letstatiox  =  e  in  e'  £  ::=  exactly  as  £.,  with  £i  — >  £  and  e  ej 


Stage  1  op.sem.: 


(b.  grf  )  — >(Me'd) 

(Ar,SM)^,(,a',S[c']) 

(/r  ,  letstatiox  =  V  in  e  ) 


( ft ,  S[letstaticx 

(b,  e[v/x]  ) 


vine])  — (^1,  S[e[v/x]]  ) 


(d)  Computational  language:  staging  support 


Figure  11.  Main  definitions  in  metatheory 
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a  concrete  context  triggers  a  set  of  potentially  complicated  a-renamings,  which  a  concrete  representation  makes 
explicit.  We  use  a  hybrid  technique  representing  bound  variables  as  deBruijn  indices,  and  free  variables  as 
deBruijn  levels.  Our  technique  is  a  small  departure  from  the  named  approach,  requiring  fewer  extra  annotations 
and  lemmas  than  normal  deBruijn  indices.  Also  it  identifies  terms  not  only  up  to  a-equivalence,  but  also  up  to 
extension  of  the  context  with  new  variables;  this  is  why  it  is  also  used  within  the  VeriML  implementation.The 
two  fundamental  operations  of  this  technique  are  freshening  and  binding,  which  are  shown  in  Fig.  11a.  Details 
can  be  found  in  section  A  of  the  appendix. 

Extension  variables.  We  extend  the  logic  with  support  for  meta-variables  and  context  variables  -  we  refer  to 
both  these  sorts  of  variables  as  extension  variables.  A  meta-variable  A,-  stands  for  a  contextual  term  T  =  [<I>]  t, 
which  packages  a  term  together  with  the  context  it  inhabits.  Context  variables  (|),  stand  for  a  context  <I>,  and 
are  used  to  “weaken”  parametric  contexts  in  specific  positions.  Bofh  kinds  of  variables  are  needed  fo  supporf 
manipulation  of  open  logical  terms.  Defails  of  fheir  definition  and  fyping  are  shown  in  Fig.  11b.  We  use  fhe 
same  hybrid  approach  as  above  for  represenfing  fhese  variables.  A  somewhaf  subfle  aspecf  of  fhis  extension  is 
fhaf  we  generalize  fhe  deBruijn  levels  I  used  fo  index  free  variables,  in  order  fo  deal  effectively  wifh  paramefric 
confexfs. 

Substitutions.  The  hybrid  represenfafion  fechnique  we  use  for  variables  renders  simulfaneous  subslifufions  for 
all  variables  in  scope  as  fhe  mosf  nafural  choice.  In  Fig.  11c,  we  show  some  example  rules  of  how  fo  apply  a 
full  simulfaneous  subsfifufion  a  fo  a  term  t,  denofed  as  f  •  a.  Similarly,  we  define  full  simulfaneous  subsfifufions 
avp  for  extension  confexfs;  defining  fheir  application  has  a  very  nafural  description,  because  of  our  variable 
represenfafion  fechnique.  We  prove  a  number  of  subsfifufion  lemmas  which  have  simple  sfafemenfs,  as  shown 
in  Fig.  11c.  The  proofs  of  fhese  lemmas  comprise  fhe  main  efforl  required  in  proving  fhe  lype-safely  of  a 
compufafional  language  such  as  fhe  one  we  supporf,  as  fhey  represenf  fhe  poinf  where  compufafion  specific  fo 
logical  term  manipulafion  lakes  place.  Defails  can  be  found  in  section  B  of  fhe  appendix. 

Computational  language.  We  define  an  ML-slyle  compufafional  language  fhaf  supporfs  dependenf  functions 
and  dependenf  pairs  over  confexlual  lerms  T,  as  well  as  paffern  mafching  over  fhem.  Lack  of  space  precludes  us 
from  including  defails  here;  full  defails  can  be  found  in  section  C  of  fhe  appendix.  A  fairly  complete  ML  calculus 
is  supporfed,  wifh  mufable  references  and  recursive  fypes.  Type  safely  is  proved  using  slandard  lechniques;  ils 
cenlral  poinf  is  extending  fhe  logic  subslilulion  lemmas  fo  expressions  and  using  fhem  fo  prove  progress  and 
preservalion  of  dependenf  functions  and  dependenf  pairs.  This  proof  is  modular  wifh  respecl  fo  fhe  logic  and 
olher  logics  can  easily  be  supporfed. 

Pattern  matching.  Our  melalheory  includes  many  exfensions  in  fhe  paffern  mafching  fhaf  is  supporfed,  as  well 
as  a  new  approach  for  dealing  wifh  typing  pallerns.  We  include  supporf  for  pattern  mafching  over  confexfs  (e.g. 
fo  pick  oul  hypolheses  from  fhe  conlexl)  and  for  non-linear  patterns.  The  allowed  patterns  are  checked  Ihrough 
a  reslriclion  of  fhe  usual  fyping  rules  *F  hp  T  :  K. 

The  essenlial  idea  behind  our  approach  fo  pattern  mafching  is  fo  idenlify  whal  fhe  relevanl  variables  in  a 
typing  derivalion  are.  Since  confexfs  are  ordered,  “removing”  non-relevanl  variables  amounls  fo  replacing  fheir 
definilions  in  fhe  conlexl  wifh  holes,  which  leads  us  fo  parfial  conlexls  *F.  The  corresponding  notion  of  partial 
subslifufions  is  denoted  as  .  Our  main  Iheorem  aboul  pattern  mafching  can  Ihen  be  slated  as: 


Theorem  6.1  (Decidability  of  pattern  matching)  If^  ^pT  :  A,  •  hp  T' :  A  and  relevant{'¥',  <I>  h  T  :  A)  =  ty, 
then  either  there  exists  a  unique  partial  substitution  such  that  •  h  :  *F  and  T  •  a>j<  =  T',  or  no  such 
substitution  exists. 

Details  are  found  in  section  D  of  the  appendix. 
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Staging.  Our  development  in  this  paper  eritieally  depends  on  the  letstatic  eonstruet  we  presented  earlier.  It 
ean  be  seen  as  a  dual  of  the  traditional  box  eonstruet  of  Davies  and  Pfenning  [1996].  Details  of  its  typing  and 
semanties  are  shown  in  Fig.  lid.  We  define  a  notion  of  “statie  evaluation  eontexts”  S,  whieh  enelose  a  hole 
of  the  form  letstatic  x  =  •  in  e.  They  inelude  normal  evaluation  eontexts,  as  well  as  evaluation  eontexts  under 
binding  struetures.  We  evaluate  expressions  e  that  inelude  staging  eonstruets  using  the  — >s  relation;  internally, 
this  uses  the  normal  evaluation  rules,  that  are  used  in  the  seeond  stage  as  well,  for  evaluating  expressions 
whieh  do  not  inelude  other  staging  eonstruets.  If  stage-one  evaluation  is  sueeessful,  we  are  left  with  a  residual 
dynamie  eonfiguration  {/J ,  ed)  whieh  is  then  evaluated  normally.  We  prove  type-safety  for  stage-one  evaluation; 
its  statement  follows. 

Theorem  6.2  (Stage-one  Type  Safety)  If  •;  Z;  •  h  e  :  X  then:  either  e  is  a  dynamic  expression  edt  or,  for  every 
store  p  such  that  h  /r :  Z,  we  have:  either  p,e  — error,  or,  there  exists  an  e' ,  a  new  store  typing  Z'  D  Z  and  a 
new  store  p'  such  that:  {p,e)  — )■  {p' ,e');  h  p' :  Z';  and  •;  Z';  •  h  e' :  x. 

Details  are  found  in  seetion  E  of  the  appendix. 

Collapsing  extension  variables.  Last,  we  have  proved  the  faet  that  under  the  eonditions  deseribed  in  See.  5, 
it  is  possible  to  eollapse  a  term  t  into  a  term  t'  which  is  typed  under  the  empty  extension  variables  context;  a 
substitution  a  with  which  we  can  regain  the  original  term  t  exists.  This  suggests  that  whenever  a  proof  object  t 
for  a  specific  proposition  is  required,  an  equivalent  proof  object  that  does  not  mention  uninstantiated  extension 
variables  exists.  Therefore,  we  can  write  an  equivalent  proof  script  producing  the  collapsed  proof  object  instead, 
and  evaluate  that  script  statically.  The  statement  of  this  theorem  is  the  following: 

Theorem  6.3  *P  h  [<F] t  :  [^]tT  and  collapsible {'¥  h  [<I>]t  :  [<I>]tr),  then  there  exist  <!>',  t',  t'j  and  o  such  that 
•  h  <!>'  w/  •  h  [^']t' :  [<I>']4,  'P;  <F  h  a  :  <!>',  T  -a  =  t  anr/4  -a  =  tj. 

The  main  idea  behind  the  proof  is  to  maintain  a  number  of  substitutions  and  their  inverses:  one  to  go  from 
a  general  *P  extension  context  into  an  “equivalent”  *P'  context,  which  includes  only  definitions  of  the  form 
[<I>]  t,  for  a  constant  <I>  context  that  uses  no  extension  variables.  Then,  another  substitution  and  its  inverse  are 
maintained  to  go  from  that  extension  variables  context  into  the  empty  one;  this  is  simpler,  since  terms  typed 
under  'P'  are  already  essentially  free  of  metavariables.  The  computational  content  within  the  proof  amounts  to 
a  procedure  for  transforming  proof  scripts  inside  tactics  into  static  proof  scripts.  Details  are  found  in  section  F 
of  the  appendix. 

7.  Implementation 

We  have  completed  a  prototype  implementation  of  the  VeriML  language,  as  described  in  this  paper,  that  supports 
all  of  our  claims.  We  have  built  on  our  existing  prototype  [Stampoulis  and  Shao  2010]  and  have  added  an  exten¬ 
sive  set  of  new  features  and  improvements.  The  prototype  is  written  in  OCaml  and  is  about  6k  lines  of  code.  Us¬ 
ing  the  prototype  we  have  implemented  a  number  of  examples,  that  are  about  1 .5k  lines  of  code.  Readers  are  en¬ 
couraged  to  download  and  try  the  prototype  from  http:  /  /flint .  cs .  yale .  edu/publications/supc .  html. 

New  features.  We  have  implemented  the  new  features  we  have  described  so  far:  context  matching,  non-linear 
patterns,  proof-erasure  semantics,  staging,  and  inferencing  for  logical  and  computational  terms.  Proof-erasure 
semantics  are  utilized  only  if  requested  by  a  per-function  flag,  enabling  us  to  selectively  “trust”  tactics.  The 
staging  construct  we  support  is  more  akin  to  the  (•)static  forrn  described  as  syntactic  sugar  in  Sec.  5,  and  it  is  able 
to  infer  the  collapsing  substitutions  that  are  needed,  following  the  approach  used  in  our  metatheory. 

Changes.  We  have  also  changed  quite  a  number  of  things  in  the  prototype  and  improved  many  of  its  aspects. 
A  central  change,  mediated  by  our  new  treatment  of  the  conversion  rule,  was  to  modify  the  used  logic  in 
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order  to  use  the  explieit  equality  approaeh;  the  existing  prototype  used  the  XHOLc  logie.  We  also  switehed  the 
variable  representation  to  the  hybrid  deBruijn  levels-deBruijn  indiees  teehnique  we  deseribed,  whieh  enabled 
us  to  implement  subtyping  based  on  eontext  subsumption.  Also,  we  have  adapted  the  typing  rules  of  the  pattern 
matehing  eonstruet  in  order  to  support  refining  the  environment  based  on  the  eurrent  braneh. 

Examples  implemented.  We  have  implemented  a  number  of  examples  to  support  our  elaims.  First,  we  have 
written  the  type-safe  eonversion  eheek  routine  for  PN,  and  extended  it  to  support  eongruenee  elosure  based  on 
equalities  in  the  eontext.  Proofs  of  this  latter  taetie  are  eonstrueted  automatieally  through  statie  proof  seripts, 
using  a  naive  rewriter  that  is  non-terminating  in  the  general  ease.  We  have  also  eompleted  proofs  for  theorems  of 
arithmetie  for  the  properties  of  addition  and  multiplieation,  and  used  them  to  write  an  arithmetie  simplifieation 
taetie.  All  of  the  theorems  are  proved  by  making  essential  use  of  existing  eonversion  rules,  and  are  immediately 
added  into  new  eonversion  rules,  leading  to  a  eompaet  and  elean  development  style.  The  resulting  eode  does  not 
need  to  make  use  of  translation  validation  or  proof  by  refleetion,  whieh  are  typieally  used  to  implement  similar 
taeties  in  existing  proof  assistants. 

Towards  a  practical  proof  assistant.  In  order  to  faeilitate  praetieal  proof  and  program  eonstruetion  in  Ver- 
iML,  we  introdueed  some  features  to  support  surfaee  syntax,  enabling  users  to  omit  most  details  about  the 
environments  of  nontextual  terms  and  the  substitutions  used  with  meta- variables.  This  syntax  follows  the  style 
of  Delphin,  assuming  an  ambient  logieal  variable  environment  whieh  is  extended  through  a  eonstruet  denoted 
as  vx  :  t.e.  Still,  the  full  power  of  eontextual  modal  type  theory  is  available,  whieh  is  erueial  in  order  to  ehange 
what  the  eurrent  ambient  environment  is,  used,  as  we  saw  earlier,  for  statie  ealls  to  taeties.  In  general  the  surfaee 
syntax  leads  to  mueh  more  eoneise  and  readable  eode. 

Last,  we  introdueed  syntax  support  for  ealls  to  taeties,  enabling  users  to  write  proof  expressions  that  look  very 
similar  to  proof  seripts  in  eurrent  proof  assistants.  We  developed  a  rudimentary  ProofGeneral  mode  for  VeriML, 
that  enables  us  to  eall  the  VeriML  type-eheeker  and  interpreter  for  parts  of  souree  files.  By  adding  holes  to 
our  sourees,  we  ean  be  informed  by  the  type  inferenee  meehanism  about  their  expeeted  types.  Those  types 
eorrespond  to  what  the  eurrent  “proof  state”  is  at  that  point.  Therefore,  a  possible  workflow  for  developing 
faefies  or  proofs,  is  wrifing  fhe  known  parfs,  inserting  holes  in  missing  poinfs  fo  know  whaf  remains  fo  be 
proved,  and  ealling  fhe  fypeeheeker  fo  gel  fhe  proof  slale  informafion.  This  workflow  eorresponds  elosely  fo  fhe 
inferaefive  proof  developmenl  supporf  in  proof  assislanls  like  Coq  and  Isabelle,  bul  generalizes  if  fo  fhe  ease  of 
faefies  as  well. 

8.  Related  work 

There  is  a  large  body  of  work  fhaf  is  related  fo  fhe  ideas  we  have  presented  here. 

Techniques  for  robust  proof  development.  There  have  been  mulfiple  proposals  for  making  proof  developmenl 
inside  exisling  proof  assislanls  more  robusl.  A  well-known  teehnique  is  proof-by-reflection  [Boutin  1997]: 
wrifing  folal  and  eerlified  deeision  proeedures  wilhin  fhe  funelional  language  eonlained  in  a  logie  like  CIC.  A 
reeenlly  inlrodueed  teehnique  is  automation  through  canonical  structures  [Gonlhier  el  al.  2011]:  fhe  resolution 
meehanism  for  finding  inslanees  of  eanonieal  slruelures  (a  generalization  of  type  elasses)  is  eleverly  ulilized 
in  order  fo  program  automation  proeedures  for  speeifie  elasses  of  proposilions.  We  view  bolh  approaehes  as 
somewhal  similar,  as  bolh  are  based  in  eleverly  exploiling  slalie  “inlerprelers”  lhal  are  available  in  a  modern 
proof  assislanl:  fhe  partial  evaluafor  wilhin  fhe  eonversion  rule  in  fhe  former  ease;  fhe  unifiealion  algorilhm 
wilhin  inslanee  diseovery  in  Ihe  latter  ease. 

Our  approaeh  ean  Ihus  be  seen  as  similar,  bul  also  as  a  generalization  of  Ihese  approaehes,  sinee  a  general- 
purpose  programming  model  is  supported.  Therefore,  users  do  nol  have  to  adapl  to  a  speeifie  programming 
style  for  writing  automation  eode,  bul  ean  ralher  use  a  familiar  funelional  language.  Proof-by-refleelion  eould 
perhaps  be  used  to  supporl  Ihe  same  kind  of  extensions  to  Ihe  eonversion  rule;  still,  Ihis  would  require  refleeling 
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a  large  part  of  the  logie  in  itself,  through  a  prohibitively  eomplieated  eneoding.  Both  teehniques  are  applieahle 
to  our  setting  as  well  and  eould  be  used  to  provide  benefits  to  large  developments  within  our  language. 

The  style  advoeated  in  Chlipala  [2011]  (and  elsewhere)  suggests  that  proper  proof  engineering  entails 
developing  sophistieated  automation  taeties  in  a  modular  style,  and  extending  their  power  by  adding  proved 
lemmas  as  hints.  We  are  largely  inspired  by  this  approaeh,  and  believe  that  our  introduetion  of  the  extensible 
eonversion  rule  and  statie  eheeking  of  taeties  ean  signifieantly  benefit  it.  We  demonstrate  similar  ideas  in 
layering  eonversion  taeties. 

Traditional  proof  assistants.  There  are  many  parallels  of  our  work  with  the  LCF  family  of  proof  assistants, 
like  HOL4  [Slind  and  Norrish  2008]  and  HOL-Light  [Harrison  1996],  whieh  have  served  as  inspiration.  First, 
the  foundational  logie  that  we  use  is  similar.  Also,  our  use  of  a  dedieated  ML-like  programming  language  to 
program  taeties  and  proof  seripts  is  similar  to  the  approaeh  taken  by  HOL4  and  HOL-Light.  Last,  the  faet 
that  no  proof  objeets  need  to  be  generated  is  shared.  Still,  eheeking  a  proof  seript  in  HOL  requires  evaluating 
it  fully.  Using  our  approaeh,  we  ean  seleetively  evaluate  parts  of  proof  seripts;  we  foeus  on  conversion -like 
tactics,  but  we  are  not  limited  inherrently  to  those.  This  is  only  possible  because  our  proof  scripts  carry  proof 
state  information  within  their  types.  Similarly,  proof  scripts  contained  within  LCF  tactics  cannot  be  evaluated 
statically,  so  it  is  impossible  to  establish  their  validity  upon  tactic  definition.  It  is  possible  to  do  a  transformation 
similar  to  ours  manually  (lifting  proof  scripts  into  auxiliary  lemmas  that  are  proved  prior  to  the  tactic),  but  the 
lack  of  type  information  means  that  many  more  details  need  to  be  provided. 

The  Coq  proof  assistant  [Barras  et  al.  2010]  is  another  obvious  point  of  reference  for  our  work.  We  will 
focus  on  the  conversion  rule  that  CIC,  its  accompanying  logic,  supports  -  the  same  problems  with  respect  to 
proof  scripts  and  tactics  that  we  described  in  the  LCF  case  also  apply  for  Coq.  The  conversion  rule,  which 
identifies  computationally  equivalent  propositions,  coupled  with  the  rich  type  universe  available,  opens  up 
many  possibilities  for  constructing  small  and  efficiently  checkable  proof  objects.  The  implementation  of  the 
conversion  rule  needs  to  be  part  of  the  trusted  base  of  the  proof  assistant.  Also,  the  fact  that  the  conversion 
check  is  built-in  to  the  proof  assistant  makes  the  supported  equivalence  rigid  and  non-extensible  by  frequently 
used  decision  procedures. 

There  is  a  large  body  of  work  that  aims  to  extend  the  conversion  rule  to  arbitrary  confluent  rewrite  systems 
(e.g.  Blanqui  et  al.  [1999])  and  to  include  decision  procedures  [Strub  2010].  These  approaches  assume  some 
small  or  larger  addition  to  the  trusted  base,  and  extend  the  already  complex  metatheory  of  Coq.  Furthermore,  the 
NuPRL  proof  assistant  [Constable  et  al.  1986]  is  based  on  extensional  type  theory  which  includes  an  extensional 
conversion  rule.  This  enables  complex  decision  procedures  to  be  part  of  conversion;  but  it  results  in  a  very  large 
trusted  base.  We  show  how,  for  a  subset  of  these  type  theories,  the  conversion  check  can  be  recovered  outside  the 
trusted  base.  It  can  be  extended  with  arbitrarily  complex  new  tactics,  written  in  a  familiar  programming  style, 
without  any  metatheoretic  additions  and  without  hurting  the  soundness  of  the  logic.  The  question  of  whether 
these  type  theories  can  be  supported  in  full  remains  as  future  work,  but  as  far  as  we  know,  there  is  no  inherrent 
limitation  to  our  approach. 

Dependently-typed programming.  The  large  body  of  work  on  dependently-typed  languages  has  close  parallels 
to  our  work.  Out  of  the  multitude  of  proposals,  we  consider  the  Russell  framework  [Sozeau  2006]  as  the 
current  state-of-the-art,  because  of  its  high  expressivity  and  automation  in  discharging  proof  obligations.  In 
our  setting,  we  can  view  dependently-typed  programming  as  a  specific  case  of  tactics  producing  complex 
data  types  that  include  proof  objects.  Static  proof  scripts  can  be  leveraged  to  support  expressivity  similar  to 
the  Russell  framework.  Furthermore,  our  approach  opens  up  a  new  intriguing  possibility:  dependently-typed 
programs  whose  obligations  are  discharged  statically  and  automatically,  through  code  written  within  the  same 
language. 

Last,  we  have  been  largely  inspired  by  the  work  on  languages  like  Beluga  [Pientka  and  Dunfield  2008]  and 
Delphin  [Poswolsky  and  Schurmann  2008],  and  build  on  our  previous  work  on  VeriML  [Stampoulis  and  Shao 
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2010].  We  investigate  how  to  leverage  type-safe  taeties,  as  well  as  a  number  of  new  eonstruets  we  introduee, 
so  as  to  offer  an  extensible  notion  of  proof  eheeking.  Also,  we  address  the  issue  of  statieally  eheeking  the 
proof  seripts  eontained  within  taeties  written  in  VeriML.  As  far  as  we  know,  our  development  is  the  first  time 
languages  sueh  as  these  have  been  demonstrated  to  provide  a  workflow  similar  to  interaetive  proof  assistants. 
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Appendices 


A.  The  logic  XHOLc 

Definition  A.l  (Syntax  of  the  language)  The  syntax  of  the  logic  language  is  given  below. 

t  s\c  \  fi  \  bi  I  'k{ti).t2  I  fi  f2  I  n(fi).f2  \t\=t2  \  convf  t  I  refi  t  \  symm  t  \  trans  ti  t2  \  congapp  ti  t2 
I  congimpi  ti  t2  \  conglam  t  \  congpi  t  \  beta  ti  t2 
s\\=  Prop  I  Type  |  Type' 

<I>  ::=  •  I  <I>,  ; 
a  ::=  •  I  r 
r  ::=  •  I  Z,  c  :  f 

We  use  fi  to  denote  the  /-th  free  variable  in  the  eurrent  environment  and  bi  for  the  bound  variable  with 
deBruijn  index  i.  The  benefit  of  this  approaeh  is  that  the  representation  of  terms  is  unique  both  up  to  a- 
equivalenee  and  up  to  extensions  of  the  eurrent  free  variables  eontext. 

Definition  A.l  (Context  length  and  access)  Getting  the  length  of  a  context,  and  an  element  out  of  a  context, 
are  defined  as  follows.  In  the  case  of  element  access,  we  assume  that  i  < 


1*1  =0 

=  |<I>|  +  1 

(<I>,t)-|<I’|  =  t 
i^,t).i  =  <!>./ 

Definition  A.3  (Substitution  length)  Getting  the  length  of  a  substitution  is  defined  as  follows. 

1*1  =0 
|a,f|  =  l<^l  +  l 

Definition  A.4  (Substitution  access)  The  operation  of  accessing  the  i-th  term  out  of  a  substitution  is  defined  as 
follows.  We  assume  that  /  <  |a|. 

(a,f)-l<^l  =  t 

{of).i  =  o.i 

Definition  A.5  (Substitution  application)  The  operation  of  applying  a  substitution  is  defined  as  follows. 
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to 

so 

=  s 

CO 

=  c 

fi-o 

=  o.i 

bi-o 

=  bi 

{X{ti).t2)-0 

=  f{ti-0).{t2-0) 

{ti  t2)-0 

=  {ti-0){t2-0) 

{n{ti).t2)-o 

=  n(fi  •a).(f2-cj) 

{h  =t2)-o 

=  (?1 -a)  =  (f2-a) 

(convfi  t2)  o 

=  conv  fi  •  o)  {t2  ■  o) 

(refi  t)-o 

=  refI  (t  ■  o) 

(symm  t)  o 

=  symm  {t  -o) 

(transf]  t2)  o 

=  trans  fi  •  o)  {t2  ■  o) 

(congapp  ti  t2)  o 

=  congapp  (ti  ■  o)  {t2  •  o) 

(congimpi  ti  t2)  o 

=  congimpi  (fi -a)  (f2 -CT 

(conglam  t)  o 

=  conglam  (f- a) 

(congpi  t)  o 

=  congpi  (f- a) 

(beta  ti  t2)  o 

=  beta  fi  •  o)  {t2  •  cj) 

o'  o 

•  a  =  • 


Definition  A.6  (Identity  substitution)  The  identity  substitution  is  defined  as  follows. 

id,  =  • 

id<i),  (  =  idcj), /|ci)| 


Definition  A.7  Free  and  bound  variable  limits  for  terms  are  defined  as  follows, 
t  n 

s  <f  n 
c  n 

fi  n  n>  i 

bi  n 

{X{t\) .t2)  n  ^  ti<-fnAt2<^n 
t\  t2<^  n  ^  t\  nA  t2  n 


t  n 

s  n 
c  n 
fi<‘’n 
bi  n 
{X{ti).t2)  <^n 
ti  t2  n 


n>  i 

ti  n  At2  n  +  \ 

ti  n  At2  n 
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Definition  A.8  Free  and  bound  variable  limits  for  substitutions  are  defined  as  follows. 


a  <f  n 


•  n 

(of)  n  <S=  a<-fnAt<^  n 


a  <^n 


•  <^n 

{(5f)<^n  <S=  (5<^nAt<^n 


Definition  A.9  (Freshening)  Freshening  a  term  is  defined  as  follows.  We  assume  that  t  m  and  t  n  +  \. 


[cl 

m 

IbnM 

\bir 

mh).t2)r  = 

\t\  t2\  = 

mh)-t2)T  = 

\t\  =  ^2]  = 

|'conv?if2l  = 

[refU]  = 

[symm  t\  = 

|'transfif2l  = 

[congappfi  f2l  = 

[congimpl  ti  t2\  = 

[conglam  t'\  = 

[congpi  = 

|'betafif2l  = 


5 

c 

fi 

fm 

bi  when  i  <  n 

Mr<ii").r<2r+' 

['ll  w 

n(r(ir).(r>2r+‘) 

['ll  =  ['21 
conv  [fi]  \t2\ 
ref  I  [f] 
symm  [f] 
trans  [fi]  |'f2l 
congapp  \h]  [r2l 
congimpi  [ri]  |'r2l 
conglam  [r] 
congpi  \t] 
beta  [fi]  [f2l 


Definition  A. 10  (Binding)  Binding  a  term  is  defined  as  follows.  We  assume  that  t  <b  m  and  t 


n. 
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5 


[t\l 


=  5 

=  c 

Ifm-lC 

— 

mi 

=  fi  when  i  <m  —  \ 

[bi\ 

=  bi 

mn).t2)\ 

=  ML^iJ”)-L^2j”+^ 

[tl  t2\ 

=  L^iJ"L^2j” 

Ln(ri).r2)J 

=  n(Lfij”).Lf2j"+' 

L^i  =  hi 

=  L^iJ  =  L^2j 

[conv  t\  hi 

=  conv  [fij  [f2j 

[refi  tl 

=  refI  [fj 

[symm  fj 

=  symm  [fj 

[trans  t\  hi 

=  trans  [fij  [f2j 

[congapp  h  t2l 

=  congapp  [fij  [f2j 

[congimpi  ti  hi 

=  congimpi  [fij  [r2j 

[conglam  tl 

=  conglam  [rj 

[congpi  tl 

=  congpi  [rj 

[beta  fi  f2j 

=  beta  [rij  [f2j 

c:t  el, 

(i>\-zc:t 

<I>  h  ?  <I>  h  f2  :  ^  <I>  h  f :  Type 

<I>  h  fi  =  f2  :  Prop 


<!>./  =  ?  (5,/)Gyi  <I>|-?i:5  <I>,  ?!  h  |'f2l  |<j)|  : ‘5'  (5,5,5') 

^hfr.t  O h 5 : 5'  <I>hn(fi).f2:/ 

‘I’,  p  rf2l|<i,|  :  r'  <I>l-n(fi).  :/  oi-fi  :n(?)/  <I>l-f2:r 

<I>  h  X{ti).t2  :  n(fi).  [f'J  <I>  h  ri  f2  :  \t']  •  (idci,,f2) 
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<I>  h  :  Prop  <I>  h  r' :  fi  =  f2  <I>  h  fi  :  f  <I>  h  fi  =  fi  :  Prop  <I>  h  :  fi  =  f2 

<I>  h  conv  tt'  \t2  ‘I’  P  refi  ti -.11=11  <I>  h  symm  ta-.t2  =  t\ 

-.11=12  ^\-tb:t2=t3 

<I>  h  trans  tg  tb  :ti  = 

O  h  :  Ml  =  M2  <I>  h  Ml  :  A  ^  B  <I>  h  fi, :  M  =  A^2  <I>  h  M  :  A 

<I>  h  congapp  tg  tt  -.MiN\=  M2  N2 

<I>  h  :  Ai  =  A2  ‘I’jAi  h  \tb]  -.  Bi  =  B2  <I>  h  Ai  :  Prop  ‘I’jAi  h  [Bi]  :  Prop 
<I>  h  congimpi  tg  {X{Ai).tb)  :  n(Ai).  [BiJ  =  n(A2).  [B2J 

<I>,  A  h  \tb^  -.B  =  B'  Oh  n(A).  [BJ  =  n(A).  [B'J  :  Prop 
O  h  congpi  (k{A).tb)  :  n(A).  [Bj  =  n(A).  |_B'J 

O,  A  h  \tb^  :  Bi  =  B2  Oh  X(A).  [BiJ  =  X(A).  [B2J  :  Prop 
O  h  conglam  (k{A).tb)  :  ^(A).  [BiJ  =  X(A).  [B2J 

OhX(A).M:A^B  OhA^:A  OhA^B:Type 
O  h  beta  {'k{A).M)  N  :  (X(A).M)  =  [M]  •  {\d^,N) 

O  h  g  :  O^ 

h  O  wf  O  h  g  :  O'  O  h  f  :  h  •  g 

Oh.:.  Ohg,?:  (O'aO 

Lemma  A,12  1ft  m  and  |0|  =  m  then  t  •  id^  =  t. 

Trivial  by  induction  on  t  m.  The  interesting  case  is  /,  •  ido  =  f.  This  is  simple  to  prove  by  induction  on  O. 

Lemma  A,13  Ifa<^m  then  g  •  idm  =  CJ. 

By  induction  on  g  and  use  of  lemma  A.  12. 

Lemma  A,14  ^O  \-  t  -.t'  then  t  <-f  |0|  and  t  <*  0. 

Trivial  by  induction  on  the  typing  derivation  O  h  t  :  t'  (and  use  of  implicit  assumptions  for  [t]). 

Lemma  A, 15  Ifh^wf  then  for  any  O'  and  ti...„  such  that  O'  =  O,  ti,  t2,  -  --  ,tn  cind  h  O'  wf  we  have  that 
O'  h  id^  -.  O. 

By  induction  on  O. 

In  case  O  =  .,  trivial. 

In  case  0  =  0",  t' ,  then  by  induction  hypothesis  we  have  for  all  proper  extensions  of  O"  0",ti,  •  •  •  h  ido"  : 
O". 
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We  now  need  to  prove  that  for  all  proper  extensions  of  <I>",  t'  we  have  <I>",  t' ,  ti,  •  •  •  ,  h  idcj)",?'  :  (<!>",  t'). 
From  the  induetive  hypothesis  we  get  that  <I>",  t',  ti,  •  •  •  ,  h  ido"  :  We  also  have  that  \-  t' :  s 

by  inversion  of  the  well-formedness  of  <I>. 

Thus  by  A.  14,  we  get  that  t'  |<I>"|. 

Furthermore  by  A.  12  we  get  that  t'  •  ido"  =  t\ 

Thus  we  have  •  •  •  Tn  1“ /j<i>"|  it'-ido". 

Thus  by  applying  the  appropriate  substitution  typing  rule,  we  get  that  1“  (id(i)«,  : 

(<F",  t'),  whieh  is  exaetly  the  desired  result. 

Lemma  A.16  If<t>  h  a  :  <!>'  then  o  |<F|,  a  0  and  |a|  = 

Trivial  by  induetion  on  the  typing  derivation  for  a,  and  use  of  lemma  A.  14. 

Lemma  A,17  If\-  <I>  wf  and  |<F|  =  n  then  for  all  i  <  n,  <4>./  i. 

Trivial  by  induetion  on  the  well-formedness  derivation  for  <4>  and  use  of  lemma  A.  14. 

Lemma  A,18  1ft  m,  |a|  =  m  and  t  -  a  =  t'  then  t  ■  {a,ti,t2,  ■■■  fn)  =  t'- 

Trivial  by  induetion  on  t  m. 

Lemma  A,19  If  a  m,  |a'|  =m  and  aa'  =  Or  then  a  -  (a',tiT2,  • ' '  fn)  =  ^r- 
Trivial  by  induetion  on  a,  and  use  of  the  lemma  A.  18. 

Lemma  A.20  lf\-  <I>  wf  <!>./  =  t  and  <4>'  h  a  :  <I>,  then  <4>'  h  a.i :  t  o. 

Induetion  on  the  derivation  of  typing  for  a. 

In  the  ease  where  a  =  •,  the  (implieit)  assumption  that  i  <  |<4>|  obviously  does  not  hold,  so  the  ease  is  impossible. 
In  the  ease  where  a  =  o',t',  we  split  eases  on  whether  i  =  |<4>|  —  1  or  not. 

If  it  is,  then  the  typing  rule  gives  us  the  desired  direetly. 

If  it  is  not,  the  induetive  hypothesis  gives  us  the  result  <!>'  h  o' .1  :  t  -  o'.  Now  from  lemma  A.17  we  have  that 
<!>./  <-f  i.  We  ean  now  apply  lemma  A.18  to  get  t  -  o'  =  t  ■  {o'  f')  =  t-  o,  proving  the  desired. 

Lemma  A.21  1ft  m,  t  n  +  \,  o  m'  and  |a|  =  m  then  {t-of^,  =  [t]”  •  {o,fm')- 

By  struetural  induetion  on  t. 

Cases  t  =  s  and  t  =  c  are  trivial. 

When  t  =  fi,  we  have  i  <  m  thus  both  sides  will  be  equal  to  o.i. 

When  t  =  hi,  we  split  eases  on  whether  i  =  n  or  i  <  n. 

If  i  =  n,  then  the  left-hand  side  beeomes  -  a]”,  =  =  fm'- 

The  right-hand  side  beeomes  •  {ojm')  =  fm  ■  {ojm')  =  fm'- 
When  /  <  n  it  is  trivial  to  see  that  both  sides  are  equal  to  hi. 

In  the  ease  where  t  =  X{ti).{t2),  we  prove  the  result  trivially  using  the  induetion  hypothesis. 

The  subtlety  for  t2  is  that  the  induetive  hypothesis  is  applied  for  n  =  n  -|-  1,  whieh  is  possible  beeause  from  the 
definition  of  •  •  we  have  that  t2<^  {n  +  l)  +  l. 

Lemma  A.22  Ift<^  m  +  l,t  n,  o  m'  and  |a|  =  m  then  \  t  -  (cT,/m')Jm'+i  =  Wm+i 
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By  structural  induction  on  t.  Cases  t  =  s  and  t  =  c  are  trivial.  When  t  =  fi,  we  split  cases  on  whether  i  =  m 
or  i  <  m.  If  i  =  m,  then  the  left  hand  side  becomes:  [fm  ■  =  L/m'Jm'+i  =  ^n-  The  right  hand  side 

becomes:  ■  o  =  bn  ■  o  =  bn-  In  case  i  <  m,  both  sides  are  trivially  equal  to  a.i.  When  t  =  bi,  both  sides 

are  trivially  equal  to  bi.  When  t  =  X{ti).t2,  the  result  follows  directly  from  the  inductive  hypothesis  for  ti  and  t2, 
and  the  definitions  of  •  and  [-J . 

Lemma  A.23  Ift  m,  |a|  =m,  a  m'  and  |a'|  =  m'  then  (t  ■  o)  ■  o'  =  t  ■  (o  ■  o'). 

Trivial  induction,  with  the  only  interesting  case  where  t  =  //.  The  left  hand  side  becomes  (/,  •  a)  •  a'  =  (o.i)  ■  o'. 
The  right  hand  side  becomes  //■  (o- o')  =  (o  ■  o' ). i  =  (o.i)  ■  o'. 

Lemma  A.24  ^  |cj|  =  m  and  |<I>|  =  m  then  id^  0  =  0. 

Trivial  by  induction  on  <I>. 

Lemma  A.25  If  \t)’^=  [Tim  1  =  f'- 

By  induction  on  the  structure  of  t.  In  each  case  we  perform  induction  on  t'  as  well.  The  only  interesting  case 
is  when  t  =  f  and  t'  =  bn.  We  have  that  \t')  =  fm',  so  it  could  be  that  i  =  m.  This  is  avoided  from  the  implicit 
assumption  that  t  <I  m  (that  is  required  to  apply  freshening). 


The  main  substitution  theorem  that  we  are  proving  is  the  following. 

Theorem  A.26  (Substitution) 

If^  \-  t  :  t'  and  <!>'  h  a  :  <I>  then  h  t  ■  o  :  t'  ■  o'. 

By  structural  induction  on  the  typing  derivation  for  t. 
c:t  eL 

Case -  > 

<I>  I-j:  c  :  t 

By  applying  the  same  typing  rule  we  get  that  <!>'  h  c  :  t.  By  inversion  of  the  well-formedness  of  Z,  we  get  that 
•  \-  t  :t'.  Thus  from  lemma  A.  14  we  get  that  t  <fo  and  from  lemma  A.  18  we  get  that  t  -o  =  t.  Considering  also 
that  c-o  =  c,  the  derivation  <!>'  h  c  :  t  proves  the  desired. 


<!>./  =  t 

Case -  > 

We  have  that  frO  =  o.i.  Directly  using  lemma  A.20  we  get  that  <4>'  h  o.i  :t  o. 

{s,s')  G  A 

Case  ^  > 

Trivial  by  application  of  the  same  rule  and  the  definition  of  •. 


Case 


<I>,  ti  h  |'t2l  loi  : (5,/,/)GfR 


<Lhn(ti).t2  :/ 

By  induction  hypothesis  for  ti  we  get:  <!>'  h  •  a  :  5. 

By  induction  hypothesis  for  <L,ti  h  |'t2l  loi  :  f  and<I>',ti  -crl-  (a,/jci)/|) :  (<I’,fi)  we  get:  \t2\  101  •  : 
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We  have  s’  =  s’  ■  )  trivially. 

Also  by  the  lemma  A.21,  |'t2l  |(i,|  •  (t?,/jci)'|)  =  |'t2  •  <?! 

Thus  by  applieation  of  the  same  typing  rule  we  get  <!>'  h  n(ti  •a).(t2  •  <?)  :  whieh  is  the  desired,  sinee 

(n(ti).t2)-cj  =  n(ti  •a).(f2-cj). 

^htr.s  <I>,  ti  H  [t2l|o| 

Similarly  to  the  above,  from  the  induetive  hypothesis  for  ti  and  t2  we  get: 

<!>'  h  ti  •  a  :  5 

<I>',ti  -ah  rt2-CTl|^,| 

From  the  induetive  hypothesis  for  n(ti).  \  t’\  we  get:  h  (n(ti).  \  t’\  -  o  :  s’. 

By  the  definition  of  •  we  get:  h  n(ti  •  CJ).(  [f'J  •  a)  :  s’. 

By  the  lemma  A.22,  we  have  that  ([f'J  -  a)  =  [t’  •  (cJ,/ici,'|)J 

Thus  we  get  <!>' h  n(ti  •  a),  [f' •  (c^,/io'|)J  -  s’- 

We  ean  now  apply  the  same  typing  rule  to  get:  <!>'  h  X(fi  ■a).(t2-o)  :  n(ti  -a).  \t’  ■  (cT,/j<j>/|)J 

We  have  n(ti  -  a),  [t’  ■  (cT,/j<i>/|)J  =  n(ti  •a).(([t'J  -o)  =  (n(ti).  \  t’\  |<j,|+i)  -cr,  thus  this  is  the  desired 

result. 

<I>  h  ti  :  Yl{t).t’  <I>  h  t2  :  t 

Case - ^ ; - V  > 

^^tit2-.  •(idci>,f2) 

By  induetion  hypothesis  for  ti  we  get  h  ti  •  a  :  n(t  •  o).{t’  ■  a). 

By  induetion  hypothesis  for  t2  we  get  <!>'  h  t2  •  d  :  t  •  a. 

By  applieation  of  the  same  typing  rule  we  get  <!>'  h  {t\  t2)-o  \  [t'  •  CJ]  •  (ido'  ,t2-o). 

We  have  that  ft'  -cj]  |^,|  •  (idci,',f2  •  cj)  =  {\t’]  |(j,|  •  (a,/j<i>/|))  •  (idtj,/,  t2-o)  due  to  lemma  A.21 

From  lemma  A.23  {t  -a)  -o’  =  t  ■  {a -o’),  we  further  have  that  the  above  is  equal  to  [t']  |(j,|  •  ((cJ,/ici)/|)  •  (id<i)',t2  -cr)). 

We  will  now  prove  that  ((a,  /j<j)'|)  •  (id<j)',  t2-o)  =  a,  (12-0). 

By  definition  we  have  •  (id<i,/,  f2 -a)  =  (a- (id<i>/,  t2  •<?)),  •  (id<i,',  f2 -a))  =  (a- (ido/,  t2  • 

a)),t2-o. 

Due  to  lemma  A.  16,  we  have  that  a  Thus  from  lemma  A.19,  we  get  that  a  -  (ido',  12)  = 

CJ  •  id<i)'. 

Last  from  lemma  A.  13  we  get  that  a  •  id<j)'  =  a. 

Thus  we  only  need  to  show  that  \t’^^  |(j,|  •  (a,  {t2  •  d))  is  equal  to  ( \t’^^  |<j,|  •  (id(i),f2))  •  d. 

As  above,  per  lemma  A.23,  this  is  equal  to  [t']  |(j,|  •  ((id(i),f2)  •  d). 

From  definition  we  have  ((id<j),f2)  •  d)  =  (id<j)  •  d),  (f2  •  d). 

Furthermore,  from  lemma  A.24  we  get  that  (idcj)  •  d),  (f2  •  d)  =  d,  (f2  •  d). 

Thus  we  have  the  desired  result. 

Case  (otherwise)  > 

Simple  to  prove  based  on  the  methods  we  have  shown  above. 

Corollary  A.27  //<!>'  h  d  :  <I>  and  <I>"  h  d' :  <!>'  then  <I>"  h  d  •  d' :  <F. 

Induetion  on  the  typing  derivation  for  d,  with  use  of  the  substitution  theorem  A.26. 

Lemma  A.28  (Types  are  well-typed)  If  ^\-  t  -.t’  then  either  t’  =  Type’  or  <I>  h  F  :  5. 

By  struetural  induetion  on  the  typing  derivation  for  t. 
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c:t  el, 


Case 


<I>  I-j:  c  :  f 


> 


Trivial  by  inversion  of  the  well-formedness  of  Z. 


<!>./  =  t 

Case -  >  Trivial  by  inversion  of  the  well-formedness  of  <I>. 


Case 


{s,s')  e  A 
<^hs  :  s' 


\> 


By  splitting  eases  for  {s,s')  and  applieation  of  the  same  typing  rule. 


<I>  h  :  5 
Case - 

typing  rule. 


<I>hn(ti).t2 


By  splitting  eases  for  {s,s',s")  and  use  of  sort 


<I>  h  ti  :  <I>  h  t2  :  t 

Case - ^ - —  > 

\  KV|-('doT2) 

By  induetion  hypothesis  we  get  that  <I>  h  :  s.  By  inversion  of  this  judgement,  we  get  that  <I>,  t  h  \t'^  : 

Furthermore  we  have  by  lemma  A.15  that  <I>  h  id|<j,|  :  <F,  and  using  the  typing  for  t2  and  lemma  A.12,  we  get  that 
<Fh  id|<j>|,  t2  :  (<I>,  t). 

Thus  by  applieation  of  the  substitution  lemma  A.26  for  \t'^^  we  get  the  desired  result. 


Case  (otherwise)  l>  Simple  to  prove  based  on  the  methods  we  have  shown  above. 


Lemma  A.29  (Weakening)  TfO  h  t :  t',  then  <I>,ti,t2,  ■  ■  ■  ,tn^  t  :  t'. 

Using  lemma  A.15  we  have  that  <I>,ti,t2,  • ' '  Tn  1“  idci> :  ‘I’- 

Using  the  substitution  lemma  A.26  we  get  that  <I>,  ti ,  t2,  •  •  •  , U  1“  t  •  id<i>  :  t'  •  ido. 

From  lemma  A.  18  and  A.  14,  we  get  that  t  •  ido  =  t. 

From  lemma  A.28  we  further  get  h  t' :  and  applying  the  same  lemmas  as  for  t  we  get  t'  •  idcj)  =  t' . 


B.  Extension  with  metavariables  and  polymorphic  contexts 

B.l  Extending  with  metavariables 

First,  we  extend  the  previous  definition  of  terms  to  aeeount  for  metavariables. 


Definition  B.l  (Syntax  of  the  language)  The  syntax  of  the  logic  language  is  extended  below.  We  furthermore 
add  new  syntactic  classes  for  modal  terms  and  environments  of  metavariables. 


t  ::=■■■  I  Xi/o 
r  ::=[<!>]  t 
M::=*  I  M,  r 

Now  we  gather  all  the  plaees  from  the  above  seetion  where  something  was  defined  through  induetion  on 
terms,  and  redefine/extend  them  here.  Things  that  are  identieal  are  noted. 


Definition  B.2  (Context  length  and  access)  Identical  toA.2.  We  furthermore  define  metavariables  environment 
length  and  access  here. 
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|M| 

1*1  =0 

|M,  r|  =  |M|  +  i 

"M.i 

(M,  r).|M|  =  T 
(M,  T).i  =  M.i 

Definition  B.3  (Substitution  length)  Identical  to  A3. 

Definition  B.4  (Substitution  access)  Identical  to  A.4. 

Definition  B.5  (Substitution  application)  This  is  the  extension  of  definition  A.  5.  We  lift  it  to  modal  terms. 
to 

{Xi/a')-a  =  Xi/{a'-a) 

To 

([<!>]?) -a  =  t  o 

Definition  B.6  (Identity  substitution)  Identical  to  A.6. 

Definition  B.7  (Variable  limits  for  terms  and  substitutions)  This  is  the  extension  of  definition  A.7  and  defini¬ 
tion  A.  8  (who  are  now  mutually  dependent).  The  definition  for  substitutions  is  identical. 

t  n 

Xil<5<in  <5<^n 


t  <^n 

Xi/a  n  <S= 

Definition  B.8  (Freshening)  This  is  the  extension  of  definition  A.9.  Furthermore  we  need  to  lift  the  freshening 
operation  to  substitutions. 


n 

m 


r-c  =  • 

=  milVM 

Definition  B.9  (Binding)  This  is  the  extension  of  definition  A.  10.  As  above,  we  need  to  lift  binding  to  substitu¬ 
tions. 
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LX,/aJ”  =  Xi/{[al\) 


Wm  =  • 

Definition  B.IO  (Typing  judgements)  The  typing  judgements  defined  in  A.11  are  adjusted  as  follows. 

First,  the  judgement  t  :  f  is  replaced  by  the  judgement  M;  t  :  t'  and  the  existing  rules  are  adjusted  as 
needed.  Also  we  include  a  new  rule  shown  below. 

Second,  the  judgement  h  <I>  wf  is  replaced  by  the  judgement  M  h  <I>  w/ 

Third,  the  judgement  M;  <I>  h  a  :  <!>'  replaces  the  original  judgement  for  substitutions. 

The  h  £  wf  judgement  stays  as  is,  with  the  adjustment  shown  below. 

Last,  we  introduce  a  new  judgement  h  M  wf  for  meta-environments  and  a  judgement  M  h  T  :  T'  for  modal 
terms. 


hZwf 


hZwf  (c:)0r 

h  £  wf  h  (£,  c  :  f)  wf 


M;  Ohtit' 

M./  =  T  T  =  [<!>']  t'  M;  <I>  h  a  :  <F' 

M;  ^\-  Xi/<5  -.t'  -o 

hMwf 

h  M  wf  M  h  [<I>]  t :  [<I>]  5 

h  •  wf  - 

h  (M,  [<!>]?)  wf 

M  h  r :  r' 

M-,^\-t:t' 

Mh  :  [^]t' 


We  can  now  proceed  to  adjust  the  proofs  from  above  in  order  to  handle  the  additional  cases  of  the  extension. 

Lemma  B.ll  (Extension  of  lemmas  A.12  and  A.13)  1.  Ift  m  and  |<I>|  =  m  then  t  •  id^  =  t. 

2.  If  o  m  and  |<I>|  =  m  then  o  •  id^  =  o. 

The  two  lemmas  become  mutually  dependent.  For  the  first  part,  we  proceed  as  previously  by  induction  on  t,  and 
the  only  additional  case  we  need  to  take  into  account  is  for  the  extension^: 

We  have  that  (Xi/a)  •  idm  =  Xi/{a  •  idm)-  Using  the  second  part,  we  have  that  Xi/{a  •  idm)  =  Xi/a.  The  second 
part  is  proved  as  previously. 

*  We  will  not  note  this  any  more  below;  all  the  proofs  mimic  the  inductive  structure  of  the  base  proofs 
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Lemma  B.12  (Extension  of  lemmas  A.14  and  A,16)  1.  IfJA\  ^\- 1  \t'  then  t  <-f  |<I>|  and  t  0. 

2.  //M;  <I>  h  a  :  <f>'  then  a  <-f  |<I>|,  a  0  and  |a|  = 

Again  the  two  lemmas  beeome  mutually  dependent  when  they  weren’t  before.  For  the  first  one,  we  have 
that  M;  <I>  h  Xi/a  :  t';  using  the  seeond  part,  we  have  that  a  <-f  |<I>|  and  a  0.  By  definition  we  thus  have 
Xi/(5  <^  \^\  and  Xi/(5  0.  The  seeond  part  is  proved  as  previously. 

Lemma  B.13  (Extension  of  lemma  A.15)  IfM.  h  <f>  wf  then  for  any  <!>'  and  such  that  <!>'  =  ^,  ti,  t2,  -  ■  ■  fn 
and  M  h  <!>'  w/  we  have  that  M;  <!>'  h  id^  :  <I>. 

Identieal  as  before. 

Lemma  B.14  (Extension  of  lemma  A,17)  IfM,  h  <I>  wfand  |<I>|  =  n  then  for  all  i  <  n,  <!>./  <f  i. 

Identieal  as  before. 

Lemma  B.15  (Extension  of  lemmas  A.18  and  A,19)  1.  1ft  m,  |a|  =  m  and  t-a  =  t'  then  t  ■  {a,ti,t2,  -  ■  ■  dn)  = 
t'. 

2.  If  o  <f  m,  \(5'\  =  m  and  o  o'  =Or  then  a  •  {o',ti,t2,  ■  ■  ■  fn)  =  ^r- 

For  the  first  part,  taking  t  =  Xi/ o',  we  have  that  X / o'  m  and  thus  o'  m. 

Furthermore  we  have  {Xi/o')  •  o  =  Xi/{o'  •  o)  =  XijOr,  assuming  Or  =  o'-  o. 

Using  the  seeond  lemma  we  have  that  o'  •  {of  \f  2,  -  •  •  >  U)  = 

Thus  we  also  have  that  {Xi/o')-{ofif2,---  fn)  =Xi/{o'  ■  {of\f2,- ■  ■  fn))  =XilOr. 

For  the  seeond  part,  the  proof  proeeeds  as  previously. 

Lemma  B.16  (Extension  of  lemma  A.20)  lfJA\-^wf  <F./  =  t  and  M;  h  a  :  <F,  then  M;  <!>'  h  o.i  :to. 
Identieal  as  before. 

Lemma  B.17  (Extension  of  lemma  A.21  and  new  lemma  for  substitutions)  1.  1ft  m,t<^n  +  \,o  m' 
and  |a|  =  m  then  [t  -cj]",  =  [i]”  •  {o,fm')- 

2.  If  o'  m,  o'  n+\,  o  m'  and  |a|  =  m  then  [a'  -  a]”,  =  [ctHm  ' 

The  seeond  part  of  this  lemma  is  a  new  lemma;  it  eorresponds  to  the  lifting  of  the  first  part  to  substitutions. 

For  the  first  part,  we  have:  |" {Xi/o')  ■  a]”,  =  [A,/ (a'  •  a)]”,  =  A,/  |"a'  •  a]”,. 

Using  the  seeond  part,  we  have  that  this  is  equal  to  A,/([a']JJ,  •  (a,  fm'))- 

Furthermore,  this  is  equal  to  (A,/  [cr']”  )  •  (a,  fm')- 

Last,  this  is  equal  to  ([A/a']”  )  •  (a,  fm'),  whieh  is  the  desired. 

For  the  seeond  part,  we  proeeed  by  induetion  on  o'. 

If  a'  =  •,  the  result  is  trivial. 

Ifa'  =  a'',ithen  \{o",  t).ai:,  =  \{o" -o),  t -o^,  =  W'-oTm',  \t-oZ,. 

Using  the  induetion  hypothesis  and  the  first  part,  we  have  that  this  is  equal  to  [a”]”  •  (a,  fm'),  [?][/•  fm')  = 
[a",  t]”  •  (o,  fm'),  whieh  is  the  desired. 

Lemma  B.18  (Extension  of  lemma  A.22  and  new  lemma  for  substitutions)  1.  Ift<^m  +  l,t  n,  o  m' 

and  |a|  =  m  then  [t-{0,fm')\m,+^  =  [t\ 

m+1 

2.  If  o'  m+\,  o'  <*  n,  o  m'  and  |a|  =  m  then  [a  '■{0,fm')\l'^,  =  V0'\l^,-0. 
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This  proof  is  entirely  similar  to  the  above  for  both  parts. 

Lemma  B.19  (Extension  of  lemma  A.23  and  new  lemma  for  substitutions)  1.  If  t  m,  |a|  =  m,  a  m' 

and  |a'|  =m'  then  {t  -o)  -  o'  =  t  •  {o  •  o'). 

2.  Ifoi  m,  |a|  =m,  o  m'  and  |a'|  =  m'  then  (ai  •  a)  •  a'  =  ai  •  (a  •  a'). 

Entirely  similar  to  the  above. 

Lemma  B.20  (Extension  of  lemma  A.24)  If\o\  =m  and  |<I>|  =  m  then  ich^  0  =  0. 

Identieal  as  before. 

Lemma  B.21  (Extension  of  lemma  A.25)  1.  If  \t'Xl,=  [t'lm  1 
2.If\oM=\o'Mtheno  =  o'. 

Part  1  is  identieal  as  before,  with  the  additional  ease  t  =Xi  jo  and  t'  =  Xt/o'  handled  using  the  seeond  part.  Part 
2  is  proved  by  induetion  on  the  structure  of  o. 

Theorem  B.22  (Extension  of  main  substitution  theorem  A.26  and  corollary  A.27)  I.  If  M;  <I>  h  t  :  t'  and 

M;  <!>'  h  a  :  <f>  then  M;  'r  t  ■  o  .  t'  ■  o. 

2.  //M;  ^’'ro-.^and  M;  <I>"  h  a' :  <!>'  then  M;  <I>"  h  a  •  a' :  <L. 

3.  IfU  h  [<!>']  t :  [<!>']  t'  and  M;  <I>  h  a  :  <!>'  then  M  h  [<I>]  t  •  a  :  [<I>]  t'  •  a. 

For  the  first  part  we  have,  when  t  =Xi/oQ\ 

From  M;  h Xi/oq  :  t'  we  get  that  M./  =  [<I>o] to,  M;  <I>  h  ao  :  and  t'  =  to- ctq. 

Applying  the  second  part  of  the  lemma  for  a  =  Go  and  a'  =  a  we  get  that  M;  <!>'  h  ao  •  ^  ‘I’o- 

Thus  applying  the  same  typing  rule  for  t  =  Xi/{oo  -  o)  we  get  that  M;  <!>'  h  Xi/{oo  -  o')  :  to  -  (do  •  d'). 

Taking  into  account  the  definition  of  •  and  also  lemma  B.19,  we  have  that  this  is  the  desired  result. 

For  the  second  part,  the  proof  is  identical  to  the  proof  done  earlier. 

For  the  third  part,  by  typing  inversion  for  [<!>']  t  we  get  that  M;  <!>'  h  t  :  t' . 

Using  the  first  part  we  get  that  M;  <f>  h  t  •  d  :  f'  •  d. 

Using  the  typing  rule  for  modal  terms  we  get  M  h  [<f>]  t  •  d  :  [<I>]  t'  •  d. 

Lemma  B.23  (Meta-variables  context  weakening)  i.  7/^M;  h  t :  t'  then  M,  Ti ,  •  •  •  ,  r„;  <I>  h  t :  t'. 

2.  //M;<I>hd:<I>'t/ienM,ri,---  ,r„;  <I>  h  d  :  <!>'. 

3.  IfXi  h  <I>  wfthen  M, Ti,  -  , r„  h  <I>  w/ 

4.  //M  'rT  -.T'  thenMJi,---  Jn'rT  -.  T'. 

All  are  trivial  by  induction  on  the  typing  derivations. 

Lemma  B.24  (Extension  of  lemma  A.28)  TjfM;  <f>  h  t  :  t'  then  either  t'  =  Type'  or  M;  h  t' :  5. 

When  t  =  Xi/o,  by  inversion  of  typing  we  get  M.i  =  [<!>']  t",  M;  <I>  h  d  :  <!>'  and  t'  =  t"  -  d. 

By  inversion  of  well-formedness  for  M  and  lemma  4,  we  get  that  M  h  M./ :  [<!>']  s. 

Furthermore  by  inversion  of  that  we  get  M;  <!>'  h  t"  :  s. 

By  application  of  the  substitution  lemma  B.22  for  t"  and  d  we  get  M;  <I>  h  t"  •  (j  ;  which  is  the  desired  result. 
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Lemma  B.25  (Extension  of  the  lemma  A.29  and  new  lemma  for  substitutions)  1.  7/^M;  <f>  h  f then 
M;  ,tn^t:  t'. 

2.  //M;  (5-.^'  then  M;  '  ,  fn  1“  CJ  :  <!>'. 


For  the  first  part,  proceed  identically  as  before. 

For  the  second  part,  the  proof  is  entirely  similar  to  the  first  part  (construct  and  prove  well-typedness  of  identity 
substitution,  and  then  allude  to  substitution  theorem). 


Now  we  know  that  everything  that  all  the  theorems  we  had  proved  for  the  non-extended  version  still  hold. 
We  can  now  prove  a  new  meta-substitution  theorem.  Before  doing  that  we  need  some  new  definitions. 


Definition  B.26  (Substitutions  of  meta-variables)  The  syntax  of  substitutions  of  meta-variables  is  defined  as 
follows. 


avt  "=  •  I  T 


Definition  B.27  (Meta-substitution  length  and  access)  We  define  the  length  of  meta-substitutions  and  access¬ 
ing  the  i-th  element  as  follows. 


|t?]vt| 


1*1  =0 

T\  =  |a]v[|  +  1 


ajvt-f 


(t?M,  L).|a]vt|  =  T 

T)-i  = 

Definition  B.28  (Meta-substitution  application)  The  application  of  meta- substitutions  is  defined  as  follows. 
We  mark  the  interesting  cases  with  a  star. 


t  -ctm 


‘^'•CJvt  = 

c  -ctm  = 

fi  •  = 

bi  ■  cjm  = 

()l(fi).f2) -CJm  = 

= 

(n(fi).f2) -CTm  = 


5 

c 

fi 

bi 

-CTm) 

{h  -CTm)  (f2  -C^m) 
n(tl  •CT3Yc)-(f2  -C^m) 
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t  ■  a^vt  (continued) 


* 


<I>a3YC 


{ti  =  t2)  -aM  = 

(convti  f2) -ctm  = 

(refit) -an  = 

(symmt)-a]vt  = 

(transti  t2) -ctM  = 

(congappti  t2)-CTM  = 

(congimpi  ti  t2) -ctm  = 

(conglam  t) -aM  = 

(congpit)-aM  = 

(betati  t2) -ctm  = 

{Xi/o)  ■  ajvt 


•  -CTm 

(a,  t)  -aM  = 


•  — 

(O,  t)-aM  = 


{t\  -ctm)  =  {h  -cth) 
conv  (ti  -  an)  (f2  -ctm) 
ref  I  (t-aM) 
symm  (t-OM) 
trans  (ti  -aM)  (t2-CTM) 
congapp  (ti  -0^)  (Ii-Om) 
congimpi  (ti  -aM)  ih-OM) 
conglam  (t  -  an) 
congpi  (t -aM) 
beta  (ti  -aM)  (f2-CTM) 


a-aM,  t -Cm 


f  -ctm 


*  ([<!>]  t)-aM  =  [<I>-aM](t-cJM) 

Definition  B.29  (Meta-substitution  typing)  The  typing  judgement  for  meta-substitutions  is  as  follows. 


M  h  CT]vt  • 


M h  Cm  :  MhT  -.T'-Om 

Mh.:.  Mh  (gm,  T)  :  (M',  T') 


We  proeeed  to  prove  the  meta-substitution  theorem. 

The  lemmas  that  we  need  are  the  following: 

Lemma  B.30  (Limits  for  elements  of  metasubstitutions)  IfM\-  a^yt :  Oj^.i  =  [<I>]  t  then  t  <f  |<I>|  and 

t  <^0. 

By  repeated  inversion  of  typing  for  ajvt  we  get  that  M'  h  •  T'  for  some  M'  and  T' .  By  inversion  we  get  that 
M';  <I>  h  t :  t'.  By  use  of  lemma  2  we  get  the  desired. 

Lemma  B.31  (Freshen  on  closed  term)  1ft  <*  n  then  [t  •  a]”  =  t  • 

Easy  by  induetion  on  t. 
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Lemma  B.32  (Interaction  of  freshen  and  metasubstitution  application)  1.  Tjf  M  h  a^vc :  then  [f] "  •  ajvt  = 

\t-OMrm 

2.  IfM'rOj^-.Wthen  [a]”  •a]v[= 

The  first  part  is  proved  by  induetion  on  t.  The  interesting  ease  is  the  metavariables  case,  where  we  have  the 
following. 

=  M  ”  )  •  Gm  =  CfM-J  •  ( [Cf]  ^  •  Gm)  c^mI  m  based  on  the  second  part. 

Now  OM-i  =  [*5*]  t  and  the  above  is  further  equal  to:  f  •  [a  •  CJ]vtlm-  "^he  right-hand  side  is  rewritten  as  follows: 
[X;/a-aMlm=  =  \t  ■  (a  ■  0^)]^  =  t  ■  [a using  lemma  B. 31  and  also  B.30. 

The  second  part  is  proved  trivially  using  induction. 

Lemma  B.33  (Bind  on  closed  term)  If  t  n  then  [r  •  aj”  =  ?  •  [aj^. 

Easy  by  induction  on  t. 

Lemma  B.34  (Interaction  of  bind  and  metasubstitution  application)  1.  If  M.\-  Gm  :  then  [fj”  •  a^yt  = 

2.  //M  h  ajvt :  then  [aj”  -Gm  = 

Similar  to  the  equivalent  lemma  for  freshen. 

Lemma  B.35  (Interaction  of  substitution  application  and  metasubstitution  application)  1.  (t  ■  a)  ■  Om  = 

(t  -Om)  ■  (o-Om) 

2.  (a  ■  a')  ■aM  =  ((y-  om)  ■  (o'  ■  om) 

In  the  first  part,  we  perform  induction  on  t.  The  interesting  case  is  the  metavariables  case.  We  have: 

((Xi/a')  •  a)  •  ajvt  =  (Xi/(o'  •  a))  •  =  om-i  ■  ((o'  ■  o)  ■  om)- 

From  the  second  part,  this  is  equal  to:  ajvt-* ' 

There  exists  a  t  such  that  Om-i  =  [‘I’]  t  and  thus  the  above  is  further  equal  to: 
t  ■  ((o'  -Om)  ■  (o-Om))  =  (t  ■  (a'  -a^yi;))  •  (a-a^yj)  based  on  lemma  B.19. 

The  right-hand  side  is  written  as:  ((Xi/o')  •  Gm)  •  (<?  •  =  (t  •  (o'  •  CJjyt))  •  (a  •  CJm)-  Thus  the  desired. 

The  second  part  is  trivially  proved  by  induction  and  use  of  the  first  part. 

Lemma  B.36  (Application  of  metasubstitution  to  identity  substitution)  id<^  • 

Trivial  by  induction  on  <I>. 

Lemma  B.37  (Redundant  elements  in  metasubstitutions)  1.  IfM;  ^\-t  :t'  and  |a]v[|  =  I then  t  ■  (a^;  7i ,  72,  •  •  •  ,  r„) 
t  -Om- 

2.  IfM',  <I>  h  a  :  <!>'  and  |a]yt|  =  |M|  then  a  -  (aM,Ti,T2,  ■■■  ,Tn)  = 

3.  IfM\-<t>wfand\OM\  =  |M|  then<t>-  (aM,Ti,T2,---  Jn) 

4.  //'Ml-  T  :  T'  and  IgmI  =  |IM|  then  T  •  (ajvt,  7i,  72,  •  •  •  ,7„)  =  7 -aM- 


By  induction  on  the  typing  derivations. 

Lemma  B.38  (Type  of  t-tb  metasubstitution  element)  //h  M  wf  and  M  h  :  M'  then  M  h  ajyf* :  (M'.t)  • 
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By  induction  and  use  of  lemma  B. 37;  furthermore  using  inversion  of  the  well-formedness  relation  for  M.  Similar 
to  lemma  A.20. 


Theorem  B.39  (Substitution  over  metavariables)  1.  lfM.\  <I>  h  t :  t'  and  M'  h  M  then  M';  <I>  •  ajvt  •  t  •  CJjvt : 
t'  -Oja- 

2.  //M;  <I>  h  a  :  O'  and  M'  h  ^  then  M';  O  •  1“ 

3.  //'M  h  O  wfand  M'  h  a^vt  •  ^  then  M'  h  O  •  wf. 

4.  IfM  hT -.r  and  M'h  Cm then  Gm- 


Part  1  Proceed  by  structural  induction  on  the  typing  of  t. 


Case 


c:t  Gl, 


> 


M;  <I>I-2;C :  t 

From  inversion  of  the  well-formedness  of  £  we  have  that  •  h  t  :  5. 
From  lemma  B.37  we  have  that  t  •  ajvt  =  t- 

So  the  result  follows  from  application  of  the  same  typing  rule  for  O  •  a^yt- 


Case 


O./  =  t 


> 


M;  ^hfr.t 

We  have  t  •  ajvt  =  (‘I’  •  -h  so  using  the  same  typing  rule  we  get  M';  O  •  ajvt  ^  ft'- 1  ■  ctm- 
(s,s)  G  A 

Case^^^ - T  > 

M;  O  h  5  / 

Trivial  by  application  of  the  same  rule  and  the  definition  of  •. 


Case 


M;  O,  ti  h  |'t2]  |(j,|  :  / 


M;  <I>hn(ti).t2 
By  induction  hypothesis  for  ty  we  get:  M';  O  •  a^vt  1“  t\  •  :  s. 

By  induction  hypothesis  for  h  \t2\  |<j)|  :  s'  we  get: 

M';  <I>-a]v[Ti  'CtM  1“  \t2\  |<i.|  •  :  ■5''  -aM- 

We  have  s'  =  s'  •  a^yt  trivially. 

Also  by  the  lemma  B.32,  |'t2l  |<j)|  •  ctm  =  \t2  •  ctmI  |(I)|- 

Thus  by  application  of  the  same  typing  rule  we  get  M';  <I>  •  ajyt  F  n(ti  •  aM)-(f2  •  :  s"  which  is  the  desired. 


Case 


M;  <F,  ti  h  |"t2l|,i,|  :  n(ti).  |_t'J  :  y 


> 


M;a>hX(ti).t2:n(ti).[t'J|^l^j 
Similarly  to  the  above,  from  the  inductive  hypothesis  for  t\  and  t2  (and  use  of  lemma  B.32)  we  get: 

F  fi  -cjm  :  5 

a^vc;  ‘I’-<7]vtTi  F  |'f2-CJMl|(i)|  :  t'  -Oja 

From  the  inductive  hypothesis  for  n(ti).  \  t'\  we  get:  M';  <F  •  F  (n(ti).  [t'J  •  ctm  :  s' . 

By  the  definition  of  •  we  get:  M';  F  n(ti  •CJ]yt)-(Lf^J  lo+p  •  :  s' . 

By  the  lemma  B.34,  we  have  that  ([t'J  -aM)  =  i^i+i- 

Thus  we  get  M;  <I>  •  F  n(ti  •  Gm)-  \  t'  ■  |ci>|+i  :  s'. 

We  can  now  apply  the  same  typing  rule  to  get:  M;  <P-ctm  F  X{ti  •  CTM)-(f2  •  ctm)  :  n(ti  -aM)-  |ci)|+i- 

Wehaven(ti-aM)-  Lf'-ctMj|<j>|+i  =n(ti-a]yt)-((Ll'J|<i.|+i)-CfM)  =  (n(fi).  [f'J  -aM,  thus  this  is  the  desired 
result. 


91 


o 


M;  Oh  fi  :  M;Ohf2:f 

^ ^  ^ _ ! _ 

M;  Ohfi  f2  :  •(id<i.,f2) 

By  induction  hypothesis  for  t\  we  get  M';  O  •  ajvt  h  ti  ■  ajvt :  n(t  •  •  c^m)- 

By  induction  hypothesis  for  t2  we  get  M';  O  •  ajvt  1“  h  ■  •  t  • 

By  application  of  the  same  typing  rule  we  get  M';  O  •  Gm  1“  (li  I2)  •  :  \t'  •  ctmI  |<J)|  •  (idci),t2  •  c^m)- 

We  need  to  prove  that  ([tnicDi '  (id<i.,t2))  •  cjm  =  fl' •  |<i,|  •  (id<i..aM)l2 -ctm)- 

From  lemmaB.35  we  have  that  (|'h]|ci)| '  (id<i>,t2))  -cth  =  (riHioi  ‘^m)  •  ((id<i),t2) 

From  lemma  B.32  we  get  that  this  is  further  equal  to:  {\t'  -CTjycI  |<I)|)  •  ((idci),t2)  •  c^m)- 
From  definition  of  •  we  get  that  this  is  equal  to  ( \t'  •  GmI  loi)  •  (ido  •  CJm,  h  •  cJm)- 
Last  from  B.36  we  get  the  desired  result. 

M./  =  T  T  =  [O'l  t'  M;  O  h  a  :  O' 

Case - ^ ; -  > 

M;  OhX,/a:t  -{j 

Assuming  that  Om-i  =  [O"]  t,  we  need  to  show  that  M';  O  •  ajvt  h  t  •  (a  •  a^vt)  :  (l'  •  <?)  • 

From  lemma  B.35,  we  have  that  (F  •  a)  •  a^vt  =  {t' '  •  c^m)- 

So  equivalently  we  need  to  show  M';  O  •  ajvt  h  t  •  (a  •  a^vt)  :  {t'  •  c^m)  •  (<?  •  t^vt)- 
Using  the  second  part  of  the  lemma  for  a  we  get:  M';  O  •  a^vt  1“  ^ 

From  lemma  B.38  we  get  that  M'  h  :  M./  •  ajvt- 
From  hypothesis  we  have  that  M./  =  [O']  t' . 

Thus  the  above  typing  judgement  is  rewritten  as  M'  h  '■  [O'  •  0^:]  1  • 

By  inversion  we  get  that  Oj^.i  =  [O'  •  ajyt]  t  and  that  M';  O'  •  ajvt  t  \t'  •  a^vt- 
**  Now  we  use  the  main  substitution  theorem  B.22  for  t  and  a  •  ajvt  and  get: 

M';  O  •  an  ^  u  (<?  •  Om)  ■  {f  ■  avt)  •  (ct  •  cjm)- 

Case  (otherwise)  \> 

Simple  to  prove  based  on  the  methods  we  have  shown  above. 

Part  2  By  induction  on  the  typing  derivation  of  a. 

MhOwf 

Case -  >  Use  of  the  same  typing  rule,  for  O  •  a^yt  which  is  well  formed  based  on  part  3. 

M;  Oh  •  :  • 


M;  O  h  a  :  O'  M;  O  h  t :  t'  •  a 

Case - ; — ; -  >  By  induction  hypothesis  and  use  of  part  1  we  get: 

M;0ha,t:(0',t')  >  f  5 

M';  O  •  an  H  a  •  ajvt :  O'  •  an 
M';  O-aM^uciM  :  (t'-cj)-aM 

By  use  of  lemma  B.35  in  the  typing  for  t  •  ajvt  we  get  that: 

M';  O  •  Gm  ^  u  ctM  :  {t'  ■  c^m)  •  (cr  •  Gm) 

By  use  of  the  same  typing  rule  we  get:  M';  O  •  ajvt  1“  (<?  •  ctM)  t  •  ajvt)  :  (O'  •  (5m,  t'  •  cjm) 

Part  3  By  induction  on  the  well-formedness  derivation  of  O. 

Case  M  h  •  wf  [> 

Trivial  use  of  the  same  typing  rule. 

M  h  O  wf  M;  O  h  t :  5 

Case -  > 

M  h  O,  t  wf 

Use  of  induction  hypothesis,  part  2,  and  the  same  typing  rule. 
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Part  4  By  induction  on  the  typing  derivation  for  T . 

M;  <I>ht 

Case - — — ——7  > 

M  h  [<I>]  t :  [<I>]  t' 

Using  part  1  we  get  M';  •  ajvt  1“  ^  :  t'  •  Gm-  Thus  using  the  same  typing  rule  we  get  M'  h  [<I>  •  a^yt]  t  •  Gm  : 

[O  •  a^vt]  which  is  the  desired  result. 

B.2  Extension  with  metavariables  and  polymorphic  contexts 

In  order  to  incorporate  polymorphic  contexts,  we  change  the  representation  of  free  variables  from  a  deBruijn 
level  to  an  index  into  a  parametric  context.  We  thus  need  to  redefine  the  notions  of  length  of  a  context,  variable 
limits  etc.  in  order  to  be  compatible  with  the  new  definition  of  free  variables. 

Definition  B.40  (Syntax  of  the  language)  The  syntax  of  the  logic  language  is  extended  below.  We  use  the 
syntactic  class  T  for  modal  terms  and  modal  contexts,  and  the  syntactic  class  K  for  their  classifiers  ( modal 
terms  and  context  prefixes).  Furthermore,  we  use  a  single  context  W  for  both  extensions. 


^  I  ^,Xi 

a  ::=  •  •  •  |  a,  id(W) 

W::=»\W,K 

t  ::=  s  \  c  \  fi  \  bi  \  X{ti).t2  \  h  t2  \  n(ti).t2  |  =  U  |  conv  1 1  \  refi  t  \  symm  t  \  trans  ti  t2  \  congapp  ti  t2 

I  congimpi  ti  t2  \  conglam  t  \  congpi  t  \  beta  ti  t2  \  Xi/a 
r  ::=  I  [<I)]<f>' 

K  ::=  I  [<I>]ctx 

I::=.|I,  -11,  |W| 

Definition  B.41  (Substitution  length)  Redefinition  ofB.3. 


a|  =I 


a,t\  =  |a|,  • 
a,  id(X;)|  =  |aUX,-| 


Definition  B.42  (Ordering  of  indexes)  We  define  what  it  means  for  an  index  to  be  less  than  another  index. 


!<!' 


!<!' 


I  <  I',  •  when  I  =  r  or  I  <  I' 

I  <  I',  |X,|  whenl  =  r  orI<I' 


I  <  r  when  I  =  I'  or  I  <  r 


Definition  B.43  (Substitution  access)  Redefinition  ofB.4.  We  assume  I  <  |a 
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g.I 

(a,  t).l 
(a,  t).l 
(a,  id{Xi)).l 
(a,  id(XO).I 


t  when  |a|  =  I 
a.I  otherwise 
t  when  |a|  =  I 
a.I  otherwise 


Definition  B.44  (Context  length  and  access)  Redefinition  of  context  length  and  context  access,  from  definition 
B.2.  Furthermore  we  define  length  and  element  access  for  environments  of  contexts.  Element  access  assumes 

i<\H 


\^,t\  =  |<i>|,  • 

\^,Xi\  =  \^\,\Xi\ 


(O,  t).I  =  twhen|<I>|=I 
(<I>,  t).I  =  <I>.I  otherwise 
(<I>,  Xi)\  =  Xi  when  |<I>|  =  I 
(<I>,  X,).I  =  <I>.I  otherwise 


Definition  B.45  (Extensions  context  length  and  access)  New  definition. 


1*1  =0 
\^>,K\  =  I'El  +  l 

('E,  K).\'¥\  =  K 

{'^,K).i  =  'E./when/<  I'EI 

Definition  B.46  (Substitution  application)  Extension  of  substitution  application  from  definition  B.5.  The  ap¬ 
plication  of  a  substitution  to  a  term  is  entirely  identical  as  before,  with  a  slight  adjustment  for  the  new  definitions 
of  variable  indexes. 


{o',  id(X,))  o  =  o'  o,  id(X,) 
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Definition  B.47  (Identity  substitution)  Redefinition  of  identity  substitution  from  B.6. 


id,  =  • 

ido, f  =  id<j), /|<j)| 

idci.,x,-  =  id<i>,  id(X;) 


Definition  B.48  (Variable  limits  for  terms  and  substitutions)  Redefinition  of  the  definition  B.7. 


t  <bl 

s<fl 

c<^l 

fiH'l' 

bi<n 

^  i<r 

{X{h).t2)  <n 

<S=  ti<flM2<bl 

o  <f  I 

ti  t2  H  I 

•  </l 

<S=  ti<flAt2<bl 

a,  f  <-^  I  <^= 

(5  1  At  <f  \ 

a  n 

a,  id(V,)  <f  I  ^ 

a<flA3V:{V,  |V,-|)<I 

a,  id((|),)  n  a  n 


Definition  B.49  (Extension  of  freshening)  This  is  an  extension  of  definition  B.8  and  adjustment  for  indexes. 
We  assume  f  I  and  o  I.  Also  t  n  +  \  and  <5  n  +  \. 


IbnTl  =  fl 

IbiTi  =  b, 


Hi 

\a,tri  =  loMltTi 

ra,id(V,-)l^  =  ral^id(V,-) 


Definition  B.50  (Extension  of  binding)  This  is  an  extension  of  definition  B.9  and  adjustment  for  indexes.  We 
assume  t  H  \  and  o  H  I.  Also  t  n  and  o  n. 
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mi 

[fv\l 


bn  when  I  =  • 

/j/  otherwise 


[•\l 

la,t\"  =  la\llt\1 

La,  id(X,-)J^  =  la\lid{Xi) 

Definition  B.51  (Environment  subsumption)  We  define  what  it  means  for  an  environment  to  be  a  subenviron¬ 
ment  (be  a  prefix  of;  or  be  subsumed  by)  another  one. 


<j>  c  0 

O  C  <!>',  t  <s=  <I>  C  <!>' 

<I>  C  O',  Xi  <=  O  C  O' 


'E  C  »E',  ^  WQW' 


Definition  B.52  (Substitution  subsumption)  We  define  what  it  means  for  an  substitution  to  be  a  prefix  of 
another  one. 


a  c  a 

a  c  a',  t  <;=  a  c  a' 

a  c  a',  id(Xi)  <;=  a  c  a' 

Definition  B.53  The  typing  judgements  defined  in  B.IO  and  are  redefined  as  follows. 

1.  hXwf  is  adjusted  as  shown  below. 

2.  O  wf  is  redefined  hj;  O  wfi  and  the  rules  below  are  added. 

3.  O  h  f  :  f'  is  redefined  O  h  t  :  f',  and  adjusted  as  shown  below. 

4.  O  h  a  :  O'  is  redefined  'E;  O  h  a  :  O'  and  the  rules  below  are  added. 

5.  h  wf  is  defined  below. 

6.  W  \-  T  \  K  is  defined  below. 
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h£wf 

hZwf  (c:)0r 

h  (Z,  c  :  t)  wf 

'P  hz  <I>  wf 

'PhOwf  '¥-<^ht:s  'Ph<I>wf  'P./=  [<I>]ctx 

'Ph.wf  'P  h  (<I>,  t)  wf  'P  h  (O,  Xi)  wf 

'P;  <I>hf  :f' 

cifGl  ^.l  =  t  'P;<I>hfi:5  'P;  <I>,  fi  h  [f2l|o|  (5,/,/)g3? 

'¥-^hzc:t  'P;  O  h  /i :  f  'P;  <I>  h  n(fi).f2  : 

^■,^hh:s  'P;  O,  fi  h  [f2l|ci>|  'F;  <I)  h  n(fi).  [f'J  |<^l  .  :  /  »F;  <I)  h  fi  :  n(0/  'F;  <I>  h  f2  :  ? 

'I^;<I>hX(fi).f2:n(fi).[f'J|^l  .  'F;<I>hfif2:  |c|  •  (id«.,f2) 

=  T  T  =  [<!>']  t'  'F;  <I>  h  a  :  <!>' 

'P;  <I>hX;/a:f'-a 

'P;  O  h  a  :  O' 

'P;Oha:<I>'  'P;Ohf:f'a  'F;  O  h  a  :  O'  'F./ =  [O']  ctx  O',  X;  C  O 
Oh.:.  'P;  O  h  (a,  0  :  (O',  f')  'F;  O  h  (a,  id(X,-))  :  (O',  Xi) 


h'P  wf 

h  'P  wf  'P  h  O  wf 

h  'P  wf  'P  h  [O]  t :  [O]  5 

h'Pwf 

h  (*P,  [O]  ctx)  wf 

h  ('P,  [0]t)  wf 

'Ph  r 

'P;  Oht  :t' 

'P  h  O,  O'  wf 

'P  h  [O]  f :  [O]  t' 

'P  h  [OjO' :  [O]  ctx 

Lemma  B.54  (Extension  of  lemma  2)  1.  1ft  I  and  jOj  =  I  then  t  •  id<p  =  t. 

2.  If  a  I  and  jOj  =  I  then  a  •  /do  =  a. 

Part  1  is  proved  by  induction  on  t  <f  I.  The  interesting  case  is  fi,  with  I'  <  I.  In  this  case  we  have  to  prove 
ido-I'  =  f\i.  This  is  done  by  induction  on  I'  <  I. 

When  1  =  1',  •  we  have  by  inversion  of  jOj  =  I  that  O  =  O',  t  and  jO'j  =  I'.  Thus  ido  =  idO',  and  thus  the 
desired  result. 

When  1  =  1',  \Xi\,  exactly  as  above. 

When  1  =  1*,  •  and  !'<!*,  we  have  that  0  =  0*,  t  and  |0*|  =  I*.  By  (inner)  induction  hypothesis  we  get  that 
ido*  .1^  =  /i'.  From  this  directly  we  get  that  ido-I^  =  /r. 

When  1  =  1*,  |X,  |  and  I'  <  I*,  entirely  as  the  previous  case. 

Part  2  is  trivial  to  prove  by  induction  and  use  of  part  1  in  cases  a  =  .  or  a  =  a',  t.  In  the  case  a  =  a',  id(X,)  we 
have:  o'  <-f  I  thus  by  induction  o'  •  ido  =  o',  and  furthermore  (a',  id(X,))  •  ido  =  a. 
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Lemma  B.55  (Length  of  subcontexts)  7f  <I>  C  <!>'  then  |<I>|  < 

Trivial  by  induction  on  <I>  C  <!>'. 

Lemma  B.56  (Variable  limits  can  be  increased)  1.  Ift  <f  I  and  I  <  I'  then  t  L 
2.1ft  ^  n  and  n  ^  n  then  t  ^  n 

3.  Ifo  <1  I  and  I  <  L  then  a  <1  V 

4.  If  a  n  and  n  <n'  then  a  n' 

Trivial  by  induction  on  t  or  a. 

Lemma  B.57  (Extension  of  lemma  2)  1.  (C'E;  h  t :  t'  then  t  <1  |<I>|  and  t  0. 

2.  //*T;  <I>  h  a  :  <!>'  then  a  <1  |<I>|,  a  0  and  |a|  = 

Part  1  is  proved  similarly  as  before. 

Part  2  needs  to  account  for  the  new  case  a  =  a*,  id(V,). 

By  inversion  of  typing  for  a  we  get  that  <!)'  =  <!>*,  V,-  with  o*  :  <I>*.  By  induction  we  get  that  a*  <1  Again 
by  inversion  of  typing  for  a  we  get  that  <!>*,  A,  C  <!>.  Thus  a*  <1  |<I>|  by  use  of  lemma  B.56.  Furthermore  from 
<f>*,  Xi  C  <f>  and  lemma  B.55  we  get  that  |A,|  <  |<I>|.  Thus  for  L  =  we  have  L,  |A,|  <  |<I>|  thus  we 
overall  get  a  <1  |<I>|. 

Furthermore  the  other  two  parts  of  the  theorem  are  trivial  from  induction  hypothesis. 

Lemma  B.58  (Extension  of  lemma  B.13)  If'¥\-<t>wf  then  for  any  <!>'  such  that  <I)  C  <!>'  and  *P  h  <!>'  w/  we 
have  that  *P;  <!>'  h  id^  :  <I>. 

Similar  to  the  original  proof.  The  new  case  for  <I>  =  <!>',  A,  works  as  follows.  By  induction  hypothesis  for  <!>'  we 
get  that  *P;  <!>',  A,-  h  ido'  :  <!>'.  Now  for  any  environment  <F*  such  that  <!>',  A  C  <!>*,  by  using  the  typing  rule  for 
id(A,),  we  get  the  desired. 

Lemma  B.59  (Extension  of  lemma  B.14)  If'i'\-<t>  wf  and  |<I>|  =  I  then  for  all  L  <  I  with  <F.I'  =  t,  we  have 
<I>.I'  <f  I. 

Identical  as  before. 

Lemma  B.60  (Extension  of  lemmas  B.15  and  B.15)  1.  Ift  <I  I,  |a|  =  I,  t  •  a  =  t'  and  a  C  a'  then  t  -  o'  =  t'. 

2.  If  a  <I  |a'|  =  I,  a  •  a'  =  and  o  c  a"  then  aa"  =  Or- 

Part  1  is  identical  as  before.  In  part  2,  in  case  0  =  0,  id(A,),  proved  trivially  by  definition  of  substitution 
application. 

Lemma  B.61  (Extension  of  lemma  B.16)  Tjf  *P  \-  ^wf  <I>.I  =  t  and  *P;  <!>'  h  a  :  <I>,  then  *P;  h  a. I  :t  o. 

The  proof  proceeds  by  structural  induction  on  the  typing  derivation  for  o  as  before.  In  case  o  =  a*,  id(A,),  we 
have  that  (<!>*,  A,)  C  <f>'.  We  have  that  <I>*.I  =  <I>.I  =  t  (since  I  /=|<I>*|,  because  (<F*,  A,).|<I>|  /=t).  Thus  from 
induction  hypothesis  for  o*  we  get  that  *P;  h  a*.I :  t  -  a*.  Using  lemma  B.60  and  also  the  fact  that  a.I  =  a*. I, 
we  get  that  *P;  <!>'  h  a.I  :t  o. 

Lemma  B.62  (Extension  of  lemma  B.17)  I .  If  t  <I  1,  t  n  +  o  <I  L  and  |a|  =  I  then  \t  ■o'li  =  [f]  j  • 

{^,fv)- 
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2.  If  o'  I,  o'  n  +  \,  a  \'  and  |a|  =  I  then  [a'  •  cj]  j/  =  •  (cj,/i/). 

Part  1  is  entirely  similar  as  before,  with  slight  adjustments  to  aeeount  for  the  new  type  of  indiees.  Part  2  needs 
to  aeeount  for  the  new  ease  of  o'  =  o" ,  id(X,),  whieh  is  entirely  trivial  based  on  the  definition. 

Lemma  B.63  (Extension  of  lemma  B.18)  1.  If  t  I,  •,  t  n,  o  1'  and  |a|  =  I  then  [t  ■  (a,/i/)Jj,  = 

2.  If  o'  <-f  I,  •,  o'  n,  o  <-f  I'  and  |a|  =  m  then  [a'  •  (CT,/i')J  j,  .  =  lo'\1,_ .  •  O. 

Similarly  to  the  above. 

Lemma  B.64  (Extension  of  lemma  B.19)  I.  Ift  I,  |a|  =  \,  o  L  and  |a'|  =1'  then  (t -o)  -  o'  =  t  ■  (o-  o'). 

2.  Ifoi  I,  |a|  =  I,  a  <-^  L  and  |a'|  =  L  then  (ai  -  a)  -  a'  =  ai  •  (a -a'). 

Part  1  is  identieal  as  before.  Part  2  needs  to  aeeount  for  the  ease  where  ai  =  aj ,  id(X,),  whieh  is  entirely  trivial. 

Lemma  B.65  (Extension  of  lemma  B.20)  If\o\  =  I  and  |<I>|  =  I  then  id^  0  =  0. 

We  need  to  aeeount  for  the  new  ease  of  <I>  =  <!>',  X,-.  In  that  ease,  id<j)'.x,  =  ido',  id(2f,).  By  inversion  of 
|a|  =  I  =  |<I>'| ,  |X,j  we  get  that  o  =  o',  id(X;).  By  induetion  hypothesis  we  get  idcj)'  •  a'  =  a'.  By  lemma  B.60  we 
get  id<j)'  0  =  0'.  Last  it  is  trivial  to  see  that  (idcj)',  id(2f,))  0  =  0',  id(X;)  =  a. 

Lemma  B.66  (Extension  of  lemma  B.21)  I.  If  \t~\1  =  \t'~\1  then  t  =  t' . 

2.  If\o'\l=\<f)'ltheno  =  o'. 

Part  1  is  identieal  as  before;  part  2  holds  trivially  for  the  new  ease  of  o. 

Theorem  B.67  (Extension  of  main  substitution  theorem  B.22)  I.  If  m-,  ^  \-  t  :  t'  and  *P;  <f>'  h  a  :  <I>  then 
'P;  <!>' h  t  •  a  :  t' •  a. 

2.  //'P;  ^'^o:^and  'P;  <I>"  h  a' :  <L'  then  'P;  <I>"  h  a  •  a' :  <L. 

3.  //'P  h  [<!>']  t  :  [<!>']  t'  and^\^^o:  <!>'  then  'P  h  [<I>]  t  •  a  :  [<L]  P  •  a. 

Part  1  is  identieal  as  before;  all  the  needed  theorems  were  adjusted  above,  so  the  new  form  of  indexes  does 
not  ehange  the  proof  at  all.  The  only  ease  that  needs  adjustment  is  the  metavariables  ease. 

»p./  =  T  T  =  [<I>'1 1'  'P;  <I>  h  a  :  <L' 

Case - ^ -  > 

'P;  ^hXi/oo:t' 

From  *P;  <I>  h  Xi/oo  :  t'  we  get  that  *P./  =  [<I>o]  to,  *P;  <I>  H  do  :  <I>o  and  t'  =  to-  do- 

Applying  the  seeond  part  of  the  lemma  for  d  =  do  and  d'  =  d  we  get  that  'P;  <!>'  h  do  •  d'  :  <f>o- 

Thus  applying  the  same  typing  rule  for  t  =  Xi/{oo  -  d)  we  get  that  *P;  <!>'  h  2f,/(do  •  d')  :  to  -  (do  •  d'). 

Taking  into  aeeount  the  definition  of  •  and  also  lemma  B.64,  we  have  that  this  is  the  desired  result. 

For  the  seeond  part,  we  need  to  aeeount  for  the  new  ease  of  substitutions. 

^  'P;  <!>' h  d  :  <I>o  'P./=[<I>o]ctx  <I>o,  ^ 

'P;  <!>'  h  (d,  id(A,))  :  (<I>o,  Xt) 

By  induetion  hypothesis  for  d,  we  get:  *P;  <I>"  h  d  •  d' :  <Fo- 
We  need  to  prove  that  (<I>o,  Xi)  C  <f>". 
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We  have  that  <I>"  h  a' :  <!>'. 

By  induetion  on  (<I>o,  X,)  C  <!>'  and  repeated  inversions  of  &  we  arrive  at  a  a"  C  o'  sueh  that: 
xp;  <1,"  h  a"  :  Oo,  X, 

By  inversion  of  this  we  get  that  (Oo,  X,)  C  <P". 

Thus,  using  the  same  typing  rule,  we  get  <I>"  h  (a  •  a',id(X,))  :  (Oo,  X,),  which  is  the  desired. 
For  the  third  part,  the  proof  is  identical  as  before. 

Lemma  B.68  (Extension  of  lemma  B.24)  h  t  :  t'  then  either  t'  =  Type'  or  h  t' :  5. 

Identical  as  before. 

Lemma  B.69  (Extension  of  the  lemma  B.25)  1.  /f'F;  h  t  :  t'  and  <I>  C  <!>'  then  <f>'  h  t  :  t' . 
2.  //'F;  <I>  h  a  :  <I>"  anr/  <I>  C  <!>'  then  'F;  <!>'  h  a  :  <F". 

Identical  as  before. 

Lemma  B.70  (Adaptation  of  lemma  4)  1.  7/'*F;<f>  h  t  :  t'  and  *F  C  'F'  then  *F';  <I>  h  t  :  t'. 

2.  h  a  :  <!>'  anJ  'F  C  »F'  then  'F';  <I>  h  a  :  <F'. 

3.  //'F  h  <I>  wfand  »F  C  »F'  then  'F'  h  <I>  w/ 

4.  If^^T  :K  and  »F  C  »F'  then  'F'  h  T  :  W 

Parts  2  and  3  are  trivial  for  the  new  cases;  otherwise  identical  as  before. 


Now  we  have  proved  the  fundamentals.  We  proceed  to  define  substitutions  for  the  extension  variables  (meta- 
and  context-variables),  typing  for  such  substitutions,  and  prove  an  extensions  substitution  theorem. 

Definition  B.71  (Substitutions  of  extension  variables)  The  syntax  of  substitutions  for  meta-  and  context- 
variables  is  given  below. 

CJvj/  ::=  •  I  CJvj/,  T 

Definition  B.72  (Context,  substitution,  index  concatenation)  We  define  what  it  means  to  concatenate  one 
context  (substitution,  index)  to  another. 


<F,  <!>' 


a,  a 


<!>,(.)  =  <F 

<!>,  (<!>',  t)  =  (<!>,  O'),  t 

O,  (O',  A,)  =  (0,0'),  A, 


=  (ct,  a'),  t 

=  (ct,  o'),  id  (A,) 


cf,  (•) 

a,  (a',  t) 
a,  {&,  id{Xi)) 
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I,  (•)  =  I 

I,  (I',  •)  =  (I,  I'),  • 

I,  (iMx,-|)  =  (1,1'),  -I 

Definition  B.73  (Partial  identity  substitution)  We  define  what  partial  identity  substitutions  (for  a  suffix  of  a 
context)  are. 


icl[o],  =  • 

j  /joi+io' 

=  id[ci)]ci)',  id(X,) 

Definition  B.74  (Extensions  substitution  length  and  access)  Defined  below. 


1*1  =0 

|CTv[<,  T\  =  1  +  |cjvi/| 

Definition  B.75  (Extension  substitution  and  context  concatenation)  We  define  concatenation  of  extension 
substitutions  and  extensions  contexts  below. 


'P,  (.)  =  W 

W,{W',K)  =  {W,W'),K 


(•)  =  <^'1' 
a^,  (a(j„  T)  =  (a>i-,  a(j,),  T 


Definition  B.76  (Extensions  substitution  subsumption)  Defined  below. 


CJ>j(  C  CJvi/ 

a>i<  c  T  c  a(j, 


Definition  B.77  (Application  of  extensions  substitution)  This  is  an  adaptation  of  definition  B.28. 
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(I-a>i<),  whenaxp./  = 


I-avp 


t  •  CJV[< 


CJ  •  CTxi< 


T  0^1 


Ko^i 


CT>I<  •  Cj(j, 


•  •  CJxj/  = 

*  (I)  l^(l)  = 


=  fia^ 

{Xi /a)  •  axp  =  f  •  (a  •  a>i<)  when  0^1.1=  [<I>]  t 


•  •  CJV[< 

(a,  t)-a^> 

*  (a,  id(X,)) -axp 


CJ  •  CJxj/,  1  ■  CJ>j( 

a-avp,  ido^j-  when  axp./  = 


• .  a>j,  =  • 

(<!>,?) -0X1/  =  O-axjx,  f -axj/ 

(<I>,  Xi)  •  axj<  =  0  •  axj(,  <!>'  when  o^>.i  = 


([<!>]?) -axjx  =  [O  •  axj<]  (f  •  axj<) 
([<j>]<j>')  .{jxj/  =  [<I>-axj<](<l>'-axj<) 


([<!>]?) -axj/  =  [<!)  •  axj<]  (f  •  axj<) 

( [<I>]  ctx)  •  axj<  =  [<!)  •  axj(]  ctx 


•  •  a(j,  =  • 

(CTxj/,  'r)‘0xy  —  (Txj/ •  'T  ‘ 


Definition  B.78  (Application  of  extended  substitution  to  open  extended  context)  Assuming  that  *F'  does  not 
include  variables  bigger  than  Aixjxl  we  have: 


'P'-a-ix 


• .  a,j,  =  • 

('P^,  ^)  •  CJxj;  =  •  (Jxjf,  ^  •  ({Jxji,  Alxjii ,  •  •  •  ,  Aixj/n  Hj</|  ) 


Definition  B.79  (Identity  extension  substitution)  The  identity  substitution  for  extension  contexts  is  defined 
below. 
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id. 

idw 


id>i< 


=  id>j<,  X\ 


'I'l 


Definition  B.80  (Extensions  substitution  typing)  The  typing  judgement  for  extensions  substitutions  is  rede¬ 
fined  h  The  rules  are  given  below.  We  also  define  typing  for  open  extension  contexts. 

'Ehavpi'E'  WhT:K-o^> 

'Eh  •  :  •  'Eh  (avp,  T)  :  ('E',  K) 


'E  h  'E'  wf 


h  'E,  'E'  wf 
'E  h  'E'  wf 


Lemma  B.81  (Interaction  of  extensions  substitution  and  length)  1.  |a|  •  avj/  =  |a  •  a>i<| 

2.  |<I>|  -avi/  =  |<I>-avi/| 

By  induction  on  a  and  <I>. 

Lemma  B.82  (Interaction  of  environment  subsumption  and  length)  <I>  C  <!>'  then  |<I>|  < 

By  induction  on  <I>  C  <!>'. 

Lemma  B.83  (Interaction  of  environment  subsumption  and  extensions  substitution)  <I>  C  <!>'  then  <I>  • 
avp  c  <!)'  • 

By  induction  on  <I>  C  <!>'. 

Lemma  B.84  (Interaction  of  extensions  substitution  and  element  access)  1.  (a.I)  •  =  (a  •  axp)  .1  •  axp 

2.  (<I>.I)  •  =  (<I>  •  aip)  .1  • 

By  induction  on  I  and  taking  into  account  the  implicit  assumption  that  I  <  |a|  or  I  <  |<I>|. 

Lemma  B.85  (Extension  of  lemma  B.30)  Tf'E  h  axp  :  'E'  and  o^/.i  =  [<I>]  t  then  t  |<I>|  and  t  0. 

Identical  as  before. 

Lemma  B.86  (Extension  of  lemma  B.31)  Ift  n  then  [t  •  a]”  =  t  •  [a]” . 

Identical  as  before. 

Lemma  B.87  (Extension  of  lemma  B.32)  1.  If  W  \-  Oyv  :W  then  [t]  j  •  avp  =  \t  ■  ct'rli.o.p 
2.  //'E  h  :  'E'  then  [a]  ^  avp  =  [a  •  avp] 
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Part  1  is  proved  by  induction  on  t. 

In  the  case  t  =  bn,  we  have  that  the  left-hand  side  is  equal  to  fi  •  a>j<  =  /l  o^.  The  right-hand  side  is  equal  to 

= /lovp- 

In  the  case  t  =  X,/a,  this  is  proved  entirely  as  before,  with  trivial  changes  to  account  for  the  new  indexes. 

Part  2  is  proved  by  induction  on  a,  as  previously.  For  the  new  case  a  =  a',  id(X,),  the  result  is  trivial. 

Lemma  B.88  (Extension  of  lemma  B.33)  1ft  n  then  cjj  j  =  t  •  [cjJ  j. 

Identical  as  before. 

Lemma  B.89  (Extension  of  lemma  B.34)  i.  //'*P  h  :  *P'  then  [tj  j  •  avp  =  [t  •  a>r<J 

2.  //'P  h  avp  :  »P'  then  [aj  ^  •  avp  =  [a  • 

Proved  similarly  to  lemma  B.87. 

When  t  =  fi,  we  have  that  the  left-hand  side  is  equal  to  bn,  while  the  right-hand  side  is  equal  to  [/i.o.pJ  i.(y^  =  bn- 

Lemma  B.90  (Extension  of  lemma  B.35)  1.  {t  -o)  ■  a>r<  =  {t  ■  a>i<)  •  (a  •  a>i<) 

2.  (a  •  o')  -0^1  =  {o-  a>i<)  •  (a'  •  a>i<) 

Part  1  is  entirely  similar  as  before,  with  the  exception  of  case  t  =  fi.  This  is  proved  using  the  lemma  B.84.  Part 
2  is  trivially  proved  for  the  new  case  of  o. 

Lemma  B.91  (Extension  of  lemma  B.36)  id^p  ■ 

By  induction  on  <1>. 

When  <!>  =  •,  trivial. 

When  <1>  =  <!>',  t,  by  induction  we  have  ido/  •  avj/  =  Thus  (id<j)',  ■  o^i  =  id<i.'.0,j,,  = 

/jo'.oxpl  =  idci).o,p. 

When  <I>  =  <!>',  Xi,  we  have  that  id<j)'.(j,p,  ido^p.,-  =  idci)'.ci,p.  (by  simple  induction  on  <1>"  =  avp./). 

Lemma  B.92  (Extension  of  lemma  B.37)  1.  <t>\-  t  :t',  =  |*P|  and  C  o'^,  then  t  •  a(j,  =  t  •  a>i<. 

2.  //*P;  <I>  h  a  :  <!>',  |axi-|  =  |*P|  and  a>i<  c  a(j,  then  o  o'^,  =  o-  axp. 

3.  If'¥  |a>i<|  =  |*P|  and  c  then  <I>  •  a(j,  =  <I>  •  avp. 

4.  If^  \-  T  ■.  K,  |avj<|  =  |*P|  and  a>i<  C  then  T  •  a(j,  =  T  •  o^t. 

5.  If  K-o^  is  well-defined,  and  C  then  K  o^  =  K-  a(j,. 

6.  Tf'P  •  a>j<  is  well-defined,  and  C  a(j,,  then  *P  •  =  *P  •  o'^,. 

Parts  2  and  3  are  trivially  extended  for  the  new  cases;  others  are  identical  or  easily  provable  by  induction. 

Lemma  B.93  (Extension  of  lemma  B.38)  If\-  *P  wf  and  *P  h  avj<  :  *P'  then  *P  h  o^i.i :  *P'./  •  avp. 

By  induction  on  axp  and  then  cases  on  /  <  |avj/|. 

If  i  =  —  1  then  proceed  by  cases  for  axp. 

If  =  •,  then  the  case  is  impossible. 

If  =  a(j,,  [<I>]  t,  we  have  by  typing  inversion  for  axp  that  *P  h  [<I>]  t  :  (*P^^)  •  which  by  lemma 
B.92  is  equal  to  the  desired. 

If  a>r<  =  a(j/,  we  get  by  typing  inversion  for  avp  that  *P  h  :  [*P'.i  •  a(j,]  ctx  which  again  by 

lemma  B.92  is  the  desired. 

If  /  <  —  1  then  by  inversion  of  a>j<  we  have  that  either  a>j<  =  a(j,,  [<I>]  t  or  o^i  =  alp,  [<1>]  <!>'.  In  both  cases 

i  <  |a(j,|  —  1  so  by  induction  hypothesis  get  a(p./ :  *P'./  •  a(p  which,  using  B.92,  is  the  desired. 
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Lemma  B.94  (Interaction  of  two  extension  substitutions)  1.  (I  •  axp)  •  a(j,  =  I  •  (a>i<  •  a(j,) 

2.  {t  ■  o^>)  ■  =  t  ■  ■  o'^) 

3.  (<I>  •  a.i<)  •  a;j,  =  <I>  •  (a>i-  •  a!j,) 

4.  (a  •  a>i-)  •  a(j,  =  a  •  (a.i<  •  a(j,) 

5.  (r  •  avp)  •  a(j,  =  r  •  (avp  •  a(j,) 

6.  {K  ■  a>i<)  •  alj,  =  /:  •  (a>i<  •  aij,) 

7.  ('L  •  avj/) .  a(j,  =  .  (avp  •  aj^,) 

Part  1  By  induction  on  I.  The  interesting  case  is  I  =  I',  X,-.  In  that  case  we  have  (I  •  a>j<)  •  a(j,  =  (L  •  a>i<)  • 
a(j,,  axp./ •  a(j,.  Trivially  •  a(j,  =  (a>i<  •  and  also  using  induction  hypothesis,  we  have  that  the  above  is 
further  equal  to  I'  •  (avj<  •  a(j,) ,  (avj<  •  a(j,) which  is  exactly  the  desired. 

Parti  By  induction  on  t.  The  interesting  case  is  t  =  X,/a.  The  left-hand-side  is  then  equal  to  •  (a-  axp))  • 
a(j;,  with  avj<./  =  [<!)]  t.  This  is  further  rewritten  as  (t  •  (cJ  •  CJ>i<))  •  a(j,  =  {t  •  a(j,)  •  ((a  •  a>j<)  •  a(j,)  through  lemma 
B.90.  Furthermore  through  part  4  we  get  that  this  is  equal  to  {t  •  a(j,)  •  (a  •  (a>j<  •  ct(j,)). 

The  right-hand-side  is  written  as:  {Xi/a)  ■  (avp  •  We  have  that  (a>i<  •  •  a(j,  =  [<I>  •  a(j,]  {t  ■ 

Thus  {Xi/a)  •  (a>i<  •  aij,)  =  (t  •  a(j,)  •  (a  •  (a^  • 

Part  3  By  induction  on  <I>.  When  <1>  =  <I>,  Xi,  we  have  that  the  left-hand-side  is  equal  to  (<1>  •  a>i<)  •  a(j,,  <!>'  •  a(j, 
with  By  induction  hypothesis  this  is  further  equal  to  <1>  •  (a>i<  •  <!>'  •  a(p. 

Also,  we  have  that  (a>i<  •  a(,)./=[<l>.ay  O  *  CTvj/-  Thus  the  right-hand-side  is  equal  to  O  *  (cr^  *  (Tvj/),  ^  *  CTvj>, 
which  is  exactly  equal  to  the  left-hand-side. 

Rest  Similarly  as  above. 

Lemma  B.95  (Interaction  of  identity  substitution  and  extension  substitution)  If  |a>i<|  =  |*F|  then  idj)  ■  avp  = 

CJvi/ 

By  induction  on  'T.  If  =  •,  trivial.  If  K  then  idvp/  jf  •  avp  =  (idvj/',  Xnj</|)  •  a>i<.  From  =  |*F|  we 

have  that  avp  =  T,  and  from  induction  hypothesis  for  a(j,  we  get  that  the  above  is  equal  to  a(j,,  Anj</|  •  avp  = 
T  =  a>i<. 

Parti 

Lemma  B.96  (Interaction  of  identity  substitution  and  extension  substitution)  1.  t  ■  ichjj  =  t 
1.  <t>  •  id>ji  = 

3.  a  •  /cAp  =  a 

4. T-ich  =  T 

5.  K-ick>=K 

6.  ■  /cAj(  = 

All  are  trivially  proved  by  induction.  We  will  give  only  details  for  the  axp  case. 

By  induction  on  axp.  If  a^j/  =  •,  trivial.  If  =  a(j/,  T,  then  we  have  that  (a(j,,  T)  •  id>i<  =  a(j,  •  idvp,  T  •  idvp.  The 
first  part  is  equal  to  a(j,  by  induction  hypothesis  (and  use  of  lemma  B.92).  For  the  second  we  split  cases  for  T . 
We  have  ([<!>]  t)  •  id>i<  =  [<I>  •  idxp]  (f  •  idvj/)  =  [<I>]  t,  and  similarly  for  ([<I>]<I>')  •  id>i<  =  by  use  of  the  other  parts. 

Theorem  B.97  (Extension  of  lemma  B.39)  1.  lf^\  <I>  h  t :  F  and  'F'  h  :  *F  then  'F';  <I>  •  h  t  ■  a>i< :  t'  ■  avp. 
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2.  //'F;  <I>ha:<I>'  ancm^'  ^  then  'F';  <I>  •  avp  h  a  •  a>i- :<!>'• 

3.  If'¥  h  <I>  wfand  'F'  h  :  'F  then  'F'  h  O  •  avp  w/ 
4.If'¥hT:Kand'¥'ho^^:'¥then'¥'hT■a^f:K■a^^. 

5.  If'¥'  h  a>i. :  'F  and  '¥"  h  a(p  :  'F'  then  '¥"  h  •  a(j, :  'F. 


Part  1.  Case 


<^.l  =  t 


> 


We  have  (<I>  •  avp).!  •  avp  =  (<I>.I)  •  a>i<  from  lemma  B.84. 


»!/./  =  T  T  =  [<I>'1 1'  'B;  <I>  h  a  :  <!>' 

Case - ^ -  > 

'B;  <^hXi/o:t'  a 

From  lemma  B.93  get  'F'  h  o^|.i :  (*F./)  •  a>i<. 

Furthermore,  this  ean  be  written  as: 

*F'  h  a^i.i :  [<!>'  •  avp]  d  ■  a>i<. 

Thus  by  typing  inversion,  and  assuming  o^i.i  =  [<!>'  •  a>j<]  t  get: 

*F';  <t>'  •  1 :  t’  •  a>i<.  From  part  2  for  a  get  *F';  <I>  •  a>j<  h  a  •  avp  :  <!)'  •  avp. 

From  lemma  B.67  and  the  above  we  get  *F';  <F  •  axp  h  t  •  (a  •  axp)  :  (t'  •  axp)  •  (a  •  axp). 

Using  the  lemma  B.90  we  get  that  (F  •  avj/)  •  (a  •  avp)  =  (F  •  a)  •  avp,  thus  the  above  is  the  desired. 


Case  (otherwise)  > 

The  rest  of  the  cases  are  trivial  to  adapt  to  account  for  indexes  from  lemma  B.39. 


Part  2.  The  cases  for  a  =  •  or  a  =  a',  t  are  entirely  similar  as  before. 

'F;  <I>  h  a  :  <F'  'F./  =  [<!>']  ctx  <!>',  X,  C  <F 

'F;  <F  h  (a,  id(W))  :(<!>',  W)  ^ 

In  this  case  we  need  to  prove  that  *F';  <F  •  axp  h  (a  •  aip,  idc,,,,!)  :  •  a>j<, 

By  induction  hypothesis  for  a  we  get  that  *F';  <I>  •  avp  h  a  •  a>j< :  <!>'  •  a>j<. 

From  lemma  B.93  we  also  get:  *F'  h  :  *F./  •  a>j/. 

We  have  that  *F./  =  [<!>']  ctx,  so  this  can  be  rewritten  as:  *F'  h  avp./ :  [<J>'  •  a>i<]  ctx. 

By  typing  inversion  get  =  [<!>'  •  a>i<]  <I>"  for  some  <I>"  and: 

'F'  h  [<!>'  •  avp]  <F"  :  [<F'  •  a>i-]  ctx. 

Now  proceed  by  induction  on  to  prove  that  *F';  <I>  •  h  (a  •  avp,  ido,j,.,)  :  (<!>'  •  avj/, 

When  <!>"  =  •,  trivial. 

When  <I>"  =  <!>'",  t,  have  *F';  <I>  •  avp  h  a  •  axp,  id[<j,/.o,j,]<i>/// :  (<!>'  •  a>i<,  <!>'")  by  induction  hypothesis.  We 
can  append to  this  substitution  and  get  the  desired,  because  -aij*!,  |<J>"'|)  < 

This  is  because  (<F',  X,)  C  <!>  thus  (<!>'  •  <!>"',  t)  C  <!>  and  thus  (|<I>'  •  •)  <  |<I>|.  When 

<I>"  =  Xj,  have  *F';  <I>  •  a>i<  h  a  •  a>i/,  id[cj,/.<j,j,]<i>///  •  Now  we  have  that  <F',  Xi  C  <!>, 

which  also  means  that  (<F'  •  a>i<,  <!>'",  Xj)  C  <!>  •  a>j(.  Thus  we  can  apply  the  typing  rule  for  id(Xy)  to 
get  that  *F';  <F  •  h  a  •  avp,  id[<j>/.cj,j,] om ,  id(Xy)  :  (<F'  •  avp,  <!>"',  Xj),  which  is  the  desired. 


Part  3. 
Trivial. 


Case 


'Fh*wf 
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Case 


'P  h  <I>  wf  'P;  <I>  h  f :  5 


> 


^  h  (<I>,  t)  wf 
By  induction  hypothesis  we  get  h  <I>  •  wf. 

By  use  of  part  1  we  get  that  *B';  <I>  •  h  t  •  :  5. 

Thus  using  the  same  typing  rule  we  get  the  desired  h  (<I>  •  t  •  *T)  wf. 


'Th<I>wf  'T./=  [Ojctx 
Case -  > 

'T  h  (<I>,  Xi)  wf 

By  induction  hypothesis  we  get  h  <I>  •  a>j<  wf. 

By  use  of  lemma  B.93  we  get  that  h  o^i.i :  *T./  •  a>i<. 

We  have  *T./  =  [<I>]  ctx  thus  the  above  can  be  rewritten  as  h  o^i.i :  [O  •  a>i<]  ctx. 

By  inversion  of  typing  get  that  a^i.i  =  [<I>  •  a>i<]  <!>'  and  that  h  <I>  •  avp,  <!>'  wf.  This  is  exactly  the  desired  result. 


Part  4.  Case 


'T;  <I>  h  t  :  t' 


By  use  of  part  1  we  get  that 'T';  <I>  •  a>j<  \-  t  \t'  ■  avp. 

Thus  by  application  of  the  same  typing  rule  we  get  exactly  the  desired. 


'T  h  <I>,  <!>'  wf 

Case - — — — —  l> 

'T  h  [<I>]  <!>' :  [O]  ctx 

By  use  of  part  3  we  get  h  <I>  •  a>i<,  <!>'  •  avp  wf. 

Thus  by  the  same  typing  rule  we  get  exactly  the  desired. 


Part  5.  Case 
Trivial. 


> 


Case 


'B'  h  (a^,  T)  :  ('T,  K) 

By  induction  we  get  'B"  h  •  a!j, :  *B. 

By  use  of  part  4  we  get  *B"  h  T  •  a(j, :  {K  •  axp)  •  a(j,. 

This  is  equal  to  K  •  (avp  •  ajj,)  by  use  of  lemma  B.94.  Thus  we  get  the  desired  result  by  applying  the  same  typing 
rule. 


Lemma  B.98  //'B  h  'B"  wf  and  'B'  h  avp  :  'B  then  'B'  h  'B"  •  avp  wf 
By  induction  on  the  structure  of  'B". 

Case  *B"  =  •  >  Trivial. 

Case'B"  =  'B",  [<I>]f  > 

By  induction  hypothesis  we  have  that  *B'  h  'B"  •  wf. 

By  inversion  of  well-formedness  for  *B",  [<I>]  t  we  get: 

'B,  'B"  h  [<L]  t :  [<I>]  5. 

We  have  for  a(j,  =  a>i<,  Xnj</| ,  •  •  •  that  'B',  'B"  •  h  aij, :  'B,  'B". 

Thus  by  application  of  lemma  B.97,  we  get  that: 
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'P',  [<D-ay5. 

Thus  h  *T',  (*T",  [<!>]?)•  CT'p  wf,  which  is  the  desired. 

Case  [<I>]  ctx  >  Similarly  as  the  previous  case. 

B.3  Final  extension:  bound  extension  variables 

The  metatheory  presented  in  the  previous  subsection  only  has  to  do  with  meta  and  context  variables  that  are  free. 
We  now  introduce  bound  extension  variables,  (which  will  be  bound  in  the  computational  language),  entirely 
similarly  to  how  we  have  bound  and  free  variables  for  the  logic.  We  will  not  re -prove  everything  here;  all 
theorems  from  above  carry  on  exactly  as  they  are.  We  will  only  prove  two  theorems  that  have  to  do  with  the 
interaction  of  freshen/bind  and  extension  substitutions. 

Definition  B.99  (Syntax  of  the  language)  The  syntax  of  the  logic  language  is  extended  below. 

<!>::=  •  ••  I 
a  ::=  •••  \a,  id(B;) 
t  ■.■.=  ■■■  I  Bi/o 

|I,  \Bi\ 

All  the  following  definitions  are  extended  trivially.  Application  of  extension  substitution  leaves  bound 
extension  variables  as  they  are.  Bound  extension  variables  are  untypable. 

Definition  B.lOO  (Freshening  of  extension  variables)  We  define  freshening  similarly  to  normal  variables.  We 
do  not  define  extension  variables  limits:  we  will  use  the  condition  of  weU-definedness  later.  (So  if  \t]^ is 
well-defined,  that  means  that  it  does  not  have  extension  variables  larger  than  N-\-K). 


[•] 

=  • 

[Irl 

=  [11 

5 

=  [11 

=  [11 

,  XN+K-j-\  when  j  <K 

\lBt]^.K 

=  [11 

,  Bi  when  i  <  M 

\fl\N,K  - 

=  bi 

=  X,/(ral"^) 

I" Bm+j/ =  Xn+k-j-  1  / ( v,/r)  when  j  <K 
~  when  i  <  M 
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^M 


\tMk 


iM 


[•] 

\^,t] 

\^,BM+j]^^l. 


m ,  r^i 

m ,  Xi 

[<I>]  ,  Xn+k-j-i  when  j  <K 
\  O]  ,  Bi  when  i  <  M 


r-1 

\a,i] 

[a,id(X,-)l 

\a,id{BM+j)]^j^ 


[cTi  ,  r^i 

[a] ,  id(X,-) 

\a] ,  id{XN+K-j-i)  when  j  <K 
[a] ,  id(B;)  when  i  <  M 


iim  =  [[‘I’lKM) 
\m^']  =  [r<i>i](r<i>n) 


iim  =  [[‘I’lKM) 

[[Ojctx]  =  [|'<I>]]ctx 


Mn.k  =  • 


Definition  B.lOl  (Binding  of  extension  variables)  'We  define  binding  similarly  to  normal  variables.  Note  that 
this  is  a  bit  different  (because  binding  many  variables  at  once  is  permitted),  so  the  N  parameter  is  the  length  of 
the  resulting  context  (the  number  of  free  variables  after  binding  has  taken  place),  while  +  ^  is  the  length  of 
the  context  where  the  bind  argument  is  currently  in. 


L«J 

=  • 

Li,-J 

=  LiJ 

5 

=  LIJ 

,  Bm+k-j-i  when  j  <K 

=  LIJ 

,  Xi  when  i  <  N 

=  LIJ 
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VT\^.k 


IK\ 


M 

N.K 


I'i'lN.K 


lfl\N,K 

lb^iN.K 

- 

=  bi 

=  K[h\l^).[t2\^^^ 

=  Bm+k-j-  1  /  (  L<^J  n,k)  when  j  <K 
=  when/<A^ 

L«J 

[^,XN+j\^j. 

1^,X,\n.k 


m ,  L^j 

[<I>J  ,  Bm  when  j  <  K 
[OJ  ,  Xi  when  i  <  N 
[<!>]  ,  Bi 


L«J 

[a,fj 

[a,id(JsCv+;)j"^ 
La,id(X,)j"  ’ 
La,id(B,)j;:;,i, 


L^J ,  L^J 

[aj ,  iA{BM+K-j-])  when  j  <K 
[aj ,  id(X,)  when  i  <  N 
[aJ ,  id{Bi) 


mt\  =  [[<!>]]  (W) 


mt\  =  [L‘j>j](W) 

[[Ojctxj  =  [[Ojjctx 


Kk 


Definition  B.102  Opening  up  and  closing  down  an  extension  context  works  as  follows: 


no 


]^,K\n 


l^iN 


J  •  U  =  • 
\^,kIn  =  J'FU, 


Now  we  prove  a  eouple  of  theorems. 

Lemma  B.103  (Freshening  of  extension  variables  and  extension  substitution)  Assuming  javp]  =  N,  \ 


and  X 

well-defined,  we  have: 

1. 

ri-CT^l".ir  = 

~  NX  '  (^'I'  ’  )  ■  ■  ■ 

,  Xn'+k-i) 

2. 

=  \l^NX'i^'^'^  Xn'i  ■■■ 

,  Xn'+k-i) 

3. 

=  ^NX 

■  (<?'!',  Xy',  ■ 

•  •  ;  Xy'+K-I 

4. 

=  \<,K- 

(a>j<,  Xy',  ■  ■ 

■  7  Xn’+k-\) 

5. 

=  m^x- 

(avj/,  Xn',  • 

■  ■  7  Xn>+k-i] 

6. 

=  \K]^,k' 

•  (cJ't',  Xy',  ■ 

•  •  ;  Xy'+K-  1 , 

7. 

=  mN,K 

■  (<?'!',  Xy',  ■ 

•  •  ,  Xff'+K-l 

Part  2  By  induetion  on  t  and  use  of  the  rest  of  the  parts.  The  interesting  ease  is  t  =  Bm+j/ sigma  with 
j  <  K.  We  have  that  the  left-hand-side  is  equal  to  /  {\^  ■  k)^  whieh  by  part  4  is  equal  to 

The  right-hand-side  is  equal  to  (Xn+k-j-i/ •  (a^,  Xa,/,  ■■■  ,Xn'+k-i) 
^N'+K-j-i/i  \^]n  K  ■  •  •  •  )  ^N'+K-i)^  whieh  is  exaetly  equal  to  the  left-hand-side. 


Part  7  By  induetion  on  *T.  The  interesting  ease  oeeurs  when  =  *T',  K. 
In  that  ease,  we  have  that  the  left-hand-side  is  equal  to: 

["'T'-avp,  K-  (aip,  X^i,  ••• , 


\N',K' 


Sinee  K  does  not  eontain  variables  bigger  than  (sinee  \K\^ is  well-defined),  we  have  that  this  is  further 
equal  to: 

This  is  then  equal  to: 

AT)  Setting a(j,  =  a>i<,  Xa?/, 

6  that  this  is  equal  to: 


^N'+K-i  we  have  by  induetion  hypothesis  and  part 


NX  ■  5  N,K  ‘ 

The  right-hand-side  is  equal  to: 


/  t  !  V  V 


Sinee  J  is  well-defined,  we  have  fhaf  if  does  nof  eonfain  variables  larger  fhan  Xn+k-\,  and  fhus  we 
have: 

,  Xn'+k-i,  •••  ,  Av'+^:-i+|‘i"|  —  \J^\nx 
Thus  fhe  fwo  sides  are  equal. 


Y  _  r i/'~\ 
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Rest  By  direct  application  of  the  other  parts. 

Lemma  B.104  (Binding  of  extension  variables  and  extension  substitution)  Assuming  javpl  =  N,  [  jjlf  and 
L'Jiv'  K  well-defined,  we  have: 

1.  [I  •  Xjv',  ' ' '  1 '^n'+k-i)\n',k~ 

2.  [t  •  Xat/,  ■■■  ,Xi,ii^K-\)\n',K= 

3.  [<L  •  (a.i<,  •  •  •  ,  Xn>+k-i)\n',k  =  1^\n,k  ' 

4.  [a  •  {a^l,  Xm'  ,  ■■■ ,  1  )J  n'.k  —  L^J  n,k  ' 

5.  [T  •  (avp,  Xn',  •  •  •  ,  Xm>+k-i)\%,k  =  "tr ' 

6.  Xf,/f,  •••  ,  Xn>+k-\)\%^k  = 

7.  Xm>,  •••  ,  Xm'+k-i)\%^k  = 

Part  2  The  interesting  case  is  when  t  =  Xj^jj^j/o  with  j  <  K.  In  that  case,  the  left-hand-side  becomes: 

l_^v'+r/(<^- ■  ■  ■  ,  '^N'+K-\))\’^I^^  =  ■■■  ■,^N'+K-\)\%^k)  =  / {\_^\n ,k  ' 

avp)  by  part  4. 

The  right-hand-side  becomes  {Bm+k-j- i/ =  Bm / { [aj •  avp) . 

Rest  Again,  simple  by  induction  and  use  of  other  parts;  similarly  as  above. 

Lemma  B.105  1.  \lC ^  =I 

L  iN,K 


Trivial  by  structural  induction. 

Lemma  B.106  (C  |ct>i<|  =  |*T|  and  1  •  fuj/i  and  ]  •  \ nf</|  are  well-defined,  then  ]  \ nj(|  -aiii  =)  \ nj</ 

By  induction  on  'T". 

When  *T"  =  •,  trivial. 

When  'T"  =  'T",  K  we  have  that: 

Applying  avj/  to  this  we  get: 

"I  flVj/l  -CJvj/,  |'.^L]  |vj(|  •  (ctxfi,  A|VJ</| ,  •  •  •  ,  ). 

By  induction  hypothesis  the  first  part  is  equal  to: 
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Using  lemma  B.87  for  the  seeond  part  we  get  that  it’s  equal  to: 

Furthermore,  sinee^does  not  eontain  variables  greater  than  Xnj<| ,  we  have  that^-avp  =  K-  (avp,  Xnj(/|,  •  •  •  ,  Xnp/j+nj;// 
Thus,  the  left  hand  side  is  equal  to  ['F"  •  a>j/,  K  •  (avp,  X|vj</| ,  •  •  •  1  whieh  is  equal  to  the  right- 

hand-side. 

C.  Definition  and  metatheory  of  computational  language 

Definition  C.l  The  syntax  of  the  computational  language  is  defined  below. 


k  ::=  -k\k^  k  \ 

X  ::=  n(^).x  I  r(^).x  I  X{K).x  \  x  T 

I  unit  I  _L  I  Xi  X2  I  Xi  X  X2  I  Xi  -I-  X2  I  qa  :  ^.x  I  ref  X  I  Va  :  fc.x  I  Xa  :  ^.x  I  Xi  X2  I  a 
e  ::=  A{K).e  \eT  \  pack  T  return  (.x)  with  e  \  unpacks  {.)x.{e') 

I  0  I  error  \  'kx  -.x.e  \  e  e'  \  x\  {e,  e')  \  proj,-  e  \  inj,-  e  \  case(e,  x.e' ,  x.e”)  \  fold  e  \  unfold  e  |  ref  e 
\e\=e'  \  \e\l\  Aa  :  |  e  x  |  fix  x  :  x.e 

I  unify  T  return  (.x)  with  (*F.r'  e') 

r::=  •  I  r,  X  :  X  I  r,  a  : 

1::=.  I  I,  Z:x 

Definition  C.2  Freshening  and  binding  for  computational  kinds,  types  and  terms  are  defined  as  follows. 


M 

N.K 


\An,k 


mKX,K 

runitl"^ 

r  I 


\pa:k.T]’^j. 

rrefxl"^ 
[Voc :  k.x]f^j^ 
[Xa :  k.T:']^^^ 

ru  X2l“^ 

\<.K 


n(wj:f,^).rxiS' 

r(r^i“^).rxi“+i 


\^t]N.K^\^2]^.K 

ruij!f.^xrx2i" 

\^t]N,K+\^2]iK 

ref  rxl" 

Va :  \k]  j^  j..  \i]mk 

Xa  :  Ik]  .  [x]  ^ 

\^2Mk 
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\eTMK  ’ 

[pack  T  return  (.x)  with 

[unpacks  (.)x.(e')lw,/i: 

rr'ii" 


roi^x 

I-  1  \ 


[errorl"^ 

e2]^,K 


r(^,  e')]N,K 

Tproj,- 

r _ / 


x.e’ ,  x.e‘ 


[case(e. 

[fold  e] 
[unfold 
[ref  el  ’ 

r^i  ■■=e2\lK 


'  )  \n,k 


V-e\N.K 

\nlK 

- 

_  )i  ^ 


=  X 


unpack  \e\^ ^ 

0 

error 


-|M 

.  \N,K-  I*"  Iw, 

-|M  r  -|M 

Iw.^:  I  ^2  \n^k 

X 

(Mwx,  W^N,K) 
proi;  \e\M^ 


=  Proj;  \e\^,K 

=  Hi\e\lK 

=  case(M"^,x.[en"^,x.[e"lw,ix) 

=  fold  \e]  f. 


=  I 


case([el"^ 
fold  [e]^^^ 
unfold  \e\^f^ 

ref  \e\lK 

\ei\lK--=  \e2\^,K 

'■ 


I 

[Aa : 

r 

\^^\n.k  — 

[fix  X  :  x.e\ = 

[unify  T  return  (.x)  with  i^.T'  i-^  e')]  = 


/ 

\eMK  \^Mk 

fixx:[xl"^.Mwx 

unify  \T^^,K  return  (.  [xl“+^)  with  ([»Fl"^.  [] 


Vn{K).k\'^^^  =  n([^j"^).[fcj"+' 


y<K 


[n(^).xj[!f,^ 

[A,(^).xJ 

Lxrj" 

[unitj^^ 


MWw,ix)-LxJK' 

unit 


M+l'Pk 
N,K  ) 
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[xj  ^ j.  (continued) 


[Xl  XX2j“^ 

LXl+X2Jw,^ 

[^a:^.xj"j^ 

Lrefxj"^ 

[Va:^.xj“^ 

Lxi  ^2\n.K 
V<.K 


V^i\n.k^V^2\^.k 
LxiJa?.^:  X  [X2j^_^ 

V^^Ik+V^2\Ik 

fja  :  [k]  .  [xJ 

ref  [^\n,k 
Va  :  L^J  w/f  •  [xJw.a: 
Xa  :  [k]  .  [xJ 

1^^\n.K  V^2\Ik 


VHK).e\^,K 

\.eT\l^ 

[pack  T  return  (.x)  with  e\’^ ^ 
[unpacks  (.)x.(e')j“/i: 
LOJwx 

[error] 

\Xx-.x.e\’^^K 
L^i  e2\’^,K 

VAIk 

L(^,  e')\NX 
Lproj,- 

[case(e,  x.e\  x.e")\j^  j. 

[fold  ej 
[unfold 
Lrefejjlf,^  ’ 

L^i  ■=e2\lK 


V-e\N.K 

VI\Ik 

[Aoc :  k.e\ 

^\n.k 

[fix  X  :  x.e\ 

[unify  T  return  (.x)  with  i^.T'  i-^  e')J 


Ve\lK  VT\Ij, 

pack  [rj"^  return  (.  [xj"+^)  with  [cj“^ 
unpack  [cjyv,/f  {■)^-{W \n%  ) 

0 

error 

y^-.\:^\N,KMN,K 
L^ljwx  Ve2\^,K 

X 

a\M  \  !  \M  \ 

^\n,k^  1^\n,k) 

Proj;  [ej"jf 

inj,-  Ve\l^ 

case([cj"^,x.  [c'j"^,x.  [c"j"^) 
fold 

unfold  [ej“^ 

ref  [e\lK 
kJw.:=  \.e2\^,K 

!  [‘C 

I 

Aoc :  [^J  .  [ej 

le\N,K  1^\n.k 

fix  X  :  [xJ  .  [e] 

unify  [T\n,k  return  (.  [xj“+^)  with  {[^\n,k -1^’ ^  W\Tk'^) 


Definition  C.3  Extension  substitution  application  to  computational-level  kinds,  types  and  terms. 


k  •  aip 


■k  •  0>j( 

{k^k)-  a>i< 
(n(^)  .fc)  •  0>j( 


k  ■  CTip  —7-  k  ■  0vf/ 
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CTv[< 

(n(iir).x)  -axp 

=  n(.^r-a>i-).x-a>i< 

{'L{K).x)  •a>i- 

=  r(.^r-a>i-).x-a>i< 

(X(.^r).x)  -axi- 

=  X(.^r-a>i<).x-a>i< 

(x  T)-o^i 

=  X  •  CTip  T  •  CTv[< 

unit-avj< 

=  unit 

=  A 

(Xi  — 7-  X2)  •  CJ>j(  =  Xi  •  CT>j(  — y  X2  •  CTvp 

(Xi  X  X2)  •  CT>I< 

=  Xi  •  CT>j(  X  X2  •  CTvfi 

(xi+X2)-a^ 

=  Xl  •a>I<  +  X2  -  CJvji 

{/ja  :  k.x)  • 

'  =  /7(X  1  ^  •  CJvji.X  •  CJvji 

(ref  x)  •  a>i< 

=  ref  X  •  a>i< 

(Va  :  k.x)  •  0^ 

1  =  Voc :  ^  •  CJvji.x  •  CTvji 

(ka  :  k.x)  •  a>i 

'  =  Xoc :  ^  •  ^vji.x  •  CTvj< 

(xi  X2)  -avp 

=  Xj  •  CJvji  X2  •  CJvji 

(X  ■  CJvj; 

=  a 

{Jvp 

= 

A{K  ■  CJvji). e  •  CJvji 

{eT)-a^i 

= 

e  •  CJvji  T  •  CJvji 

(pack  T  return  (.x)  with  e)  -avp 

= 

pack  T  •  avji  return  (.x  •  avji)  with  e  •  avji 

(unpacks  {.)x.{e'))  -avp 

= 

unpack  e  •  avji  (.)x.(e'  •  avji) 

O-avp 

= 

0 

error  •  aip 

= 

error 

(Ajc 

:  x.e)  •  avp 

= 

kx  1  X  •  CJvji. e  •  CJvji 

(e  e' 

= 

e  ■  CJvji  e'  ■  CJvji 

X  ■  CJv[< 

= 

.X 

{e,  ( 

= 

(e  •  CJvji,  e'  •  CJvji) 

(proj,e)-avi, 

= 

proj,-  e  ■  avji 

(inj,- 

e)  •  CTv[< 

= 

inj;  e-avji 

(case(e,  x.e',  x.e"))  ■  a>i< 

= 

case(e  •  avji,  x.e’  •  avji,  x.e”  •  avji) 

(fold  e)  ■  aip 

= 

fold  e  ■  avji 

(unfold  e)  -  a*!* 

= 

unfold  e  ■  avji 

(ref 

e)  •  a>i< 

= 

ref  e  ■  avji 

{e:-- 

=  e')  -avi/ 

= 

e  ■  CJvji  '.  =  e’  ■  CJvji 

(!e)  -  a*!* 

= 

\e  •  CJvji 

1  •  CJvji 

= 

1 

(Aa  :  k.e)  ■  a>i< 

= 

Act  :  k  ■  CJvji. e  •  CJvji 

{e  X 

)-Oh> 

= 

e  ■  CJvji  X  •  CJvji 

(fix. 

X  :  x.e)  •  avp 

= 

fix  X  :  X  •  CJvji. e  •  CJvji 

(unify  T  return  (.x)  with  (*F.r'  e'))  ■  avp  = 

unify  T  ■  avji  return  (.x  •  avji)  with  (*F  •  avji.r'  •  avji  1-^  e'  •  avji) 

[r 

•  CJV[< 

• .  axj,  =  • 

(r,  X  x)  •  CTxf<  =  r  •  CT>j(,  X  :  T  ■  CTvf< 

(r,  (X'.  k)  ■  CT>j(  =  r  •  (X  k  ■  CJv[< 
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Definition  C.4  The  typing  judgements  for  the  computational  language  are  given  below. 


h'Pwf  'Fhyt'wf  h'F,  i^wf  'F,  1  wf 

'Ph*wf  wf  'Fhn(i^)lwf 


'P;  rhx:A: 


'P;  rhn(^).x:* 

'¥-rhx:n{K).k  '¥hT:K 
'P;  rhxr  :  [A:]|,p|  i-(idvi,,  T) 

'P;rhxi:*  'F;rhx2:* 

*P;  rh  Xi  X  X2  :  * 


'F,  rh  [xli^l^i  :* 
'F;  rhr(i^).x:* 


'P;rhunit:*  'F;rh±:* 

'F;rhxi:*  'F;rhx2:* 
'F;rhxi+X2:* 


'F,  rh  rxl|.p|,i 
'F;  rhX(^).x:n(^).[A:J|^|  i 

'F;  r  h  Xi  :  *  'F;  r  h  X2  :  * 

'¥hkwf  '¥-r,  a:khx:k 
^‘■rhpa-.k.x-.k 


'P;rhx:*  'Phytwf  'F;r,  a:A:hx:*  'Ph)twf  'F;r,  a:ythx:yt' 

'P;  r  h  ref  X  :  *  'F;  Th  Va  :  )t.x  :  *  '¥■  Fh  Xa:  k.x  :  k  ^  k' 


'F-rhxi:k^k'  '¥-rhx2-k 
rhxi  Z2--k' 


(a  :  A:)  G  r 

'F;  r  h  a  :  A: 


'P;  I;  r  h  e  :  X 


'P,  I;  rh  [e]|^l  1  :x  »F;  I;  T  h  e  :  n(i^).x  'FhT'.K 

»P;  I;  rhA(^).e:n(^).[xJ|^l  1  'F;  I;  rheJ:  [x]  |>j,|  j  •  (id^,  T) 

»Phr:^  »F,  rh  [xlivpi.i  :*  'F;  I;  Th  e  :  [xl|^l  i  •  (id^,  T) 

*P;  Z;  r  h  pack  T  return  (.x)  with  e  :  I.{K).x 

'P;  I;  rhe:r(^).x  'F,  I;  T,  x  :  [x] ,^1  i  h  [e']  :  x'  »F;rhx':* 

'F-  I;  r  h  unpack  e  (.)x.(e')  :x'  'F;  I;  Th  ()  :  unit 

'P;  I;  r,  X  :  X  h  e  :  x' 

'P;  I;  rh  error  :x  'F;  I;  Th  Xx  :  x.e  :  x  ^  x' 
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./ 


'F;  I;  r  h  e  :  X 


(x  :  x)  G  r 

'F;  I;  rhx:x 


>y;£;rhei:xi 

'P;  I;  rh  (ei,  €2}  :  Xi  x  X2 


'¥■  I;  The: 


'P;  I;  r  h  e  e  :  x' 


*P;  Z;  r  h  e  :  Xi  X  X2  /  =  1  or  2 
'P;  I;  r  h  proj,-  e  :  x,- 


'F;r;rhe:x,  /  =  1  or  2 
'F;  I;  rh  inj^e  :  Xi  +X2 


*P;  Z;  r  h  e  :  Xi  +  X2  *F;  Z;  F,  x  :  Xi  h  ei  :  x  *F;  Z;  F,  x  :  X2  F  ^2  : 
'F;  Z;  Fhcase(e,  x.ei, x.e2)  :  X 


*F;  F  h  /ra  :  fc.x  :  ^  *F;  Z;  F  h  e  :  x[/ra  :  k.x/a]  Xi  X2  •  •  •  x„ 

*F;  Z;  F  h  fold  e  :  (/ra  :  k.i)  Xi  X2  •  •  •  x„ 


*F;  F  h  /ra  :  ^.x  :  *F;  Z;  F  h  e  :  (^a  :  k.x)  Xi  X2  •  •  •  x„ 

*F;  Z;  F  h  unfold  e  :  x[/ra  :  ^.x/a]  Xi  X2  •  •  •  x„ 


'F;  Z;  Fhe:x 
*F;  Z;  Fh  ref  e  :  ref  x 


'F;  Z;  Fhe:refx  'F;Z;Fhe':x 
'F;  Z;  Fhe:=e':unlt 


'F;  Z;  Fhe:  refx 
'F;  Z;  Fh!e:x 


(/ :  x)  G  Z 
'F;  Z;  F  h  Z :  ref  X 


'F;  Z;  F,  a:Z:he:x  'F;  Z;  F  h  e  :  Ha  :  Z:.x'  'F;Fhx:Z:  'F;  Z;  F,  x  :  X  h  e  :  x 

'F;  Z;  FhAa:Z:.e:na:Z:.x  'F;  Z;  Fhex:x'[x/a]  'F;  Z;  F h  fixx  :  x.e  :  x 


»Fhr:^  »F,  Fh  [xl|^l  1  :*  'F  h  [»F']  wf 
'F,  ['F'Ji^ih  'F,  ['F']|^|;Z;Fh  rxl|>j,|^i-(ld^,  [F'] 

*F;  Z;  F  h  unify  T  return  (.x)  with  (*F'.r'  i-g  e')  :  ( [x]  nj,|  j  •  (ld>i<,  T))  +  unit 


'FhFwf 

'FhFwf  'F;Fh/cwf 

'FhFwf  'F;Fhx:* 

'Fh*wf 

'F  h  (F,  a  :  F)  wf 

'F  h  (F,  X  :  x)  wf 

hZwf 

hZwf 

•  ;  •  h  X  :  * 

h  •  wf  h  (Z,  Z :  x) 


Definition  C.5  ^-equivalence  for  types  X  is  the  symmetric,  reflexive,  transitive  congruence  closure  of  the 
following  relation.  Types  of  the  language  are  viewed  implicitly  up  to  ^-equivalence.  This  means  that  the  lemmas 
that  we  prove  about  types  need  to  agree  on  ^-equivalent  types. 

(Fa  :  3C.x)  x'  =  x[x' /a] 


Definition  C.6  Small-step  operational  semantics  for  the  language  are  defined  below. 

V  ::=  A{K).e  \  pack  T  return  (.x)  with  v  |  ()  |  Fx  :  x.e  |  (v,  v')  |  Inj,-  v  |  fold  v  |  Z  |  Aa  :  k.e 
£  ::=  •  I  £  r  I  pack  T  return  (.x)  with  £  |  unpack  £  {.)x.{e')  |  £  e'  |  v  £  |  (£,  e)  |  (v,  £)  |  proj,-  £  |  Inj,-  £ 
I  case(£,  x.ei,  x.e2)  |  fold  £  |  unfold  £  |  ref  £  |  £  :=  e'  |  v  :=  £  |  !£  |  £  x 
p  ::=  •  I  /r,  Z  i-G  V 
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(A',e) — e')|error) 


(a/,  8[error]) — terror  {/u,  {A{K).e)T) — 


{li,e)  — ,e) 


(a/,  unpack  (r,  x)  v  (.)x.(e')  ) — {\e]Q^^-T)[v/x])  {/u,  {hc:x.e)v) — >{iu,e[v/x]) 

(a'  ,  proj;(vi,  V2) )  — case(inj;  v,x.ei,x.e2)  )  — ei[v/x]  ) 

~i(/ 1 — y  _  G  /j)  1 1 — y  _  G  jt/ 


( jj  ,  unfold  (fold  v) )  — ^  ( A'  ?  v  ) 


( A/ ,  ref  V  )  — I  ^v)  ,1)  I  :=v)  — ^  ^  v]  ,  () ) 


l^v  e/j 


(at  ,  !0  — >i^,v) 


{fu  ,  (Aa  :  k.e)  x  )  — ^  {fj  ,  e[x/a]  )  ,  fixx  :  x.e  )  — ^  (/^ ,  e[fixx  :  x.e/x]  ) 


h  axp  :  |'*P]q  A  |^l -avp  =  r) 

(a/  ,  unify  T  return  (.x)  with  {^.T'  ^e') )  — >{n,  injj  -  a*!*)  ) 

h  a>i< :  |''F]q  A  |^l -avp  =  r) 

( /j  ,  unify  T  return  (.x)  with  i^.T'  e') )  — >  ( n  ,  inj2  ()  ) 


(/  i-A  v)  G  ^ 


(/  i-A  v)  G  {li,  I  i-A  v) 

(/  i-A  v)  G  I'  i-A  v')  <;=  (Z  i-A  v)  G  /V 


(Z:x)  Gl 


(Z:x)  G  (I,  Z:x) 

(Z :  x)  G  (I,  Z'  :  x')  ^  (Z :  x)  G  I 


~  Z 


3x.(Z :  x)  G  Z  A  •;  Z;  •  h  V  :  X 
(Z :  x)  G  Z  3v.(Z  I— )■  v)  G A»;  Z;  •  h  V  :  X 


ZCZ' 


(Z :  x)  G  Z  ^  (Z :  x)  G  Z' 


MZ:=v] 


(^,  Z' I— )■  v')[Z  :=  v]  =  /^[Z  :=  v],  Z' I— )■  v' 
(^,  Z  I— )■  v')[Z  :=  v]  =  iu,lt-^v 


Lemma  C.7  (Computational  substitution  commutes  with  logic  operations)  1.  [xfx'/a]]^^  =  [[x']^^/a] 
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2.  Lx[x7«]j;^.^^ 

3.  (x[x7c)c]) -avi- 
4-  \e['iM]^,K  = 

5.  7['c/c)c]J^j^  = 

6.  (e[x/a])  = 

7-  \e[e'/x]]^^^  = 
8. 

9.  777-^])  •<^‘1'  = 


=  x-a>i<[x'-a>i</a] 

Ve\lKM^.K/^] 

e  •  0vi/[x  •  0>j//(x] 

Ve\lK[W\^,K/x] 
-  e  ■  o^i[e'  •  0vf</x] 


Simple  by  induction. 

Lemma  C.8  (Compatibility  of  P  conversion  with  logic  operations)  Ifx  =p  x'  then: 

1-  \<K=^  \^%K 

2-  Lxj^^=p 

3.  X  •  0v[<  =p  x^  •  0vj< 

In  all  cases  it’s  trivially  provable  by  expansion  for  x  =  (Xa  :  ^.Xi)X2  and  x'  =  Xi  [X2/0C]  and  use  of  lemma  C.7. 
The  congruence  cases  for  other  x,  x'  are  provable  by  induction  hypothesis. 

Lemma  C.9  Assuming  |a>i<|  =  N,  \-~\^  ^  and  \-']%  are  well-defined  for  their  respective  arguments,  we  have: 

1.  \k-a^v]%^K  =  \^]n,k'  ■■■,Xn'+k-\) 

2.  [x  •  77')  ■  ■  ■  )  77'+t<:-i) 

3.  =  7]iv,x' (^‘i')  77')  •  •  •  )  77'+a:-i) 


By  structural  induction.  We  prove  only  the  interesting  cases. 

Part  1  When  k  =  n(.^f)  .k' ,  we  have  that  the  left-hand- side  is  equal  to: 

\n{K-o^)fk'  ■o^)^%^^  =  U{\K■o^^^,^^).\k' 

We  have  by  lemma  B.  103  that  j^=  •  (^'t')  77')  •  •  •  )  77'+a:-i)- 

By  induction  hypothesis  we  have  that  \k’  •  '  (^'i')  77')  •  •  •  >  77'+/r-i)- 

After  expansion  of  freshening  for  the  right-hand-side,  we  see  that  it  is  equal  to  the  above. 

Part  3  The  most  interesting  case  occurs  when  e  =  unify  T  return  (.x)  with  ifP.T'  1— )■  e').  We  have  that  the  left- 
hand-side  is  equal  to: 

[unify  T  -avj/  return  (.x-a>i-)  with  1-^  ■^'vVi’n^k 

By  expansion  of  the  definition  of  freshening  we  get  that  this  is  equal  to: 

unify  rT.ty.,l",;relurn(.ry.o.,l"y)with(rT.tyTl^.f.rri"f''"AH"y''*l) 

The  right-hand-side  is  equal  to: 

(assuming  =  a>i<,  X^f,  ■■■ ,  Xn>+k-i) 


unify  return  (.  MS'  ''fV)  with  ([Tl"! -rTT'  [J"!"?'  Ot  ►+  k'l 

In  all  cases,  the  respective  terms  match,  by  use  of  induction  hypothesis,  lemma  B.103,  and  also  the  fact  that 
|»P|  = 


nM+l^l 


•CJu 
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Lemma  C.IO  Assuming  |a>i<|  =  N,  and  [-1^,  ^  are  well-defined  for  their  respective  arguments,  we  have: 

1.  [k  ■  {Oyv ,  Xn>  ,  ■■■  ,  Xn>+K-i)\%^k  = 

2.  [x-(avi/,  Xn',  •••  ,  Xn>+k-i)\%^k  = 

3.  Xn',  •••  ,  Xn'+k-i)\%^k  = 

Similarly  as  above,  and  use  of  lemma  B.104. 

Lemma  C.ll  1.  (^-aip)  •a(j,  =  ^- (a>i< -alj,) 

2.  (x  •  a>i<)  •  alj,  =  X  •  (a>i<  •  ai^) 

3.  (e  ■  avj/)  .(j'^  =  e-  (<Jv  ■  alj,) 

By  induction,  and  use  of  lemma  B.94. 

Lemma  C.12  ^  then: 

1.  Ifk •  aip  is  defined,  then  k-o^i  =  k-  a(j, 

2.  If  x-o^  is  defined,  then  x  •  avp  =  X  •  a(j, 

3.  If  e-  a>i<  is  defined,  then  e  ■  a>i<  =  e  •  a(j, 

4.  If  T  ■  a>j/  is  defined,  then  T  ■  avj/  =  T  ■  a(j, 

Most  are  trivial  based  on  induction  and  use  of  lemma  B.92 

Lemma  C.13  I.  //'B  h  k  wfand  'B'  h  avp  :  »B  then  ^'^k-G^>wf 

2.  ^*B;  r  h  X  :  ^  and  *B'  h  a>i< :  *B  then  'B';  T  •  avp  h  x  •  avp  :  •  avp. 

3.  ^*B;  Z;  F  h  e  :  x  and  *B'  h  :  *B  then  *B';  Z;  F  •  h  e  •  avp  :  x  •  a>i<. 

We  only  prove  the  interesting  cases. 


Parti 


Case 


h'B,  i^wf  'B,  1  wf 

'Bhn(i^).)twf 


> 


We  use  the  induction  hypothesis  for  *B',  K •  avp  h  Xnj(/|)  :  *B,  K  and  [fc]  to  get: 
*B',  •  axp  h  1"^]  Hj,|  j  2f|vj;/|)  wf. 

From  lemma  C.9  we  have  that  nj,|  j  •  (avp,  Xnj</|)  =  a>i<]  |^,|  j. 

Therefore  by  use  of  the  same  typing  rule  we  have  the  desired  result. 


Part  2  Case 


'B,  Fh  rxl|>j,^ 


> 


'B;  rhX{K).x:n{K).lk\\^\^^ 

We  use  the  induction  hypothesis  for  *B',  K  h  (a>i<,  2fnj</|)  :  *B,  K  and  [x]  to  get,  together  with  lemma  C.9: 
'B^,  K ■  F •  (cTvi',  2f|vj</| )  h  [x  •  tJvjj]  nj„|  ^ 

By  C.  12  and  the  fact  that  *B  h  F  wf,  we  have  that  F  •  (a>j<,  2fnj(/| )  =  F  •  a>i<,  so: 

*B',  F-a>i<  h  [x-aiifluj,,!  ^:k-  (avp,  Xnj//|  -aip) 

By  use  of  the  same  typing  rule  we  get: 


*B';  F-a>i<  h  •avi-).(x-a>i-) :  n(.^f  2fnj</|)J  nj„|  j). 

We  have  that  ^k-  (a>i<,  2fnj</|)J  n^,,,  ^  |^|  ^  •  a>i<  by  lemma  C.IO,  so  this  is  the  desired  result. 
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'P;rhx:n(^)l  ^^T-.K 
'P;  rhxr:  [fc]|,pn-(idvi-,  T) 

By  induction  hypothesis  we  have: 

r •  a.i<  h  X  •  a>i< : 

By  use  of  B.97  for  T  we  have: 

'B'h  r-a^ 

By  use  of  the  same  typing  rule  we  get: 

F-  CJ>i<  h  (x  •  CJir*)  {T  ■  CJvi/)  \  \k-  nj(/|  j  •  (id>j<',  T  • 

Now  we  need  to  prove  that  axp]  nj„|  ^  •  (id>i<',  T  •  avp)  =  (1"^]  |^|  ^  •  (idvp,  T))  •  a>i<. 

From  lemma  C.9,  we  get  that  the  left-hand  side  is  equal  to: 

(W|>i'|,i  ■  ^|'t"|))  ■  (id'i",  F-a>i<). 

By  application  of  lemma  C.l  1  we  get  that  it  is  further  equal  to: 

(W|'i'|,i)  ■  ■  (id*!") 

By  application  of  the  same  lemma  to  the  right-hand  side  we  have  that  it  is  equal  to:  (r^lnj/|  i)  • 
((id|>i'|,  T)-o^). 

Thus  we  only  need  to  prove  that  (axp,  Xnj(/|)  •  (idip',  T  •avi<)  =  (idnj<|,  T)  -avp. 

We  have  that  the  left-hand  side  is  equal  to: 

avp  •  (idvp/,  T  •  avp),  T  •  a>j<  =  a>i<  •  idvp/,  r  •  a>i<  by  lemma  B.92. 

Furthermore  by  lemma  B.96  we  have  that  a>j<  •  idvj//  =  avp. 

The  right-hand  side  is  equal  to: 

idujfi  •  a>i<,  T  •  =  a>i<,  T  •  avp  due  to  lemma  B.95. 


Part  3  Most  cases  are  proved  as  above,  using  the  above  lemmas.  The  most  difficult  case  is  the  pattern  matching 
construct. 

^hT:K  Fh  [xl|^l  1  :*  'B  h  wf 

'B,  ['B"]|,j,|h  'B,  ['B"]|>j,|;r;  Fh  :  rxl|vj,ii-(id>i., 

O'SSC _ ^  ^ ^  ^ ^  ^ ^  ^ ^  ^  ^  ^ ^  ^ ^  ^ 

*B;  Z;  F  h  unify  T  return  (.x)  with  (*B".r'  ^  e)\  { [x]  nj,|  j  •  (id>i<,  T))  -|-  unit 

From  *B  h  r  :  and  lemma  B.97  we  have: 

'B'h  r-a^ 

From  *B,  F  h  [x]  |,j,|  j  *B  h  F  wf,  part  2  and  lemma  C.9  we  have: 

*B',  K  ■  avp;  F  •  a>i<  h  [x  •  a>i<]  |,j„|  j  :  * 

From  *B  h  |'*B"]  |^|  wf  and  lemmas  B.98  and  B.106  we  have: 

'B'h  [»B"-avi,]|^,|  wf 

From'B,  |'*B"]nj,| ;  I;  Fh  Te'] |'i'|,|>r"'| :  M |'i>|,r  (id'i',  r'),a(j,  =  a^,  •••  ,  Xnf,/|+|.j///.o,j,|_i  and'B',  h 

a(j; :  (*B,  [*B"]  nj,|),  lemma  B.97,  lemma  B.i03,  and  lemma  B.92  we  have: 

*B',  |'*B" -cjvj,]  h 

Similarly  for  the  same  a(j,,  and  from  *B,  [*B"]nj,| ;  Z;  F  h  [e']  •  M|>i'|.i  ‘  (id'i',  rF']nj,|  nj„,|),  lemma  C.9 

and  induction  hypothesis,  we  get: 

'B  ,  ['B  •  CJvj/]  hj/;|  ;  Z;  F •  CJvj/ h  [e  •  :  ( [x]  hj;|  J  •  (idvfi,  [T  ]|vj,|jv[<«|)) 'Ciiji- 

Thus  we  only  need  to  prove  that  ([xluj,!  j  •  (id>i<,  [T']  |^l  l^„|))  •a(j,  =  [x  •  a>i<]  |^,|  j  •  (id>j//,  [T'  -avj,] 

In  that  case  we  will  use  the  same  typing  rule  to  get  the  desired  result,  using  a  similar  proof  as  this  last  step,  to 
go  from  |'x-avi/]|^,|j  •  (id>i</,  T to  ([xlupij  •  (id>i-,  T))  -axi-. 
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So  we  now  prove  ([xluj,!  1  •  (id>i<,  [r']  |^l_l^„|)) -alp  = 

By  lemma  C.9  and  lemma  B.103,  we  have  that  the  right-hand  side  is  equal  to: 

( M  l't'1,1  ■  ^|'t"|))  ■  (idip',  \T'~\  ivpi.ivp"!  ■  ^vp)- 

By  applieation  oflemmaC.il  we  see  that  both  sides  are  equal  if  Xnj</|  )-(id>j//,  [r']  |\j)|  ‘  — 

(id^ ,  I" 7^  1 1^1 

The  left-hand  side  of  this  is  equal  to  \T^~\  HJ/|  ;  \r] 

1^1 

By  lemma  B.92  and  B.96  we  get  that  this  is  further  equal  to  a>r<,  \T’~\ 

1^1 

The  right-hand  side  is  equal  to  id>r<  •  a(j,,  [ T']  nj,|  nj,«|  •  a(j,,  whieh  is  equal  to  the  above  using  lemmas 
B.92  and  B.95. 

Lemma  C.14  i.  =k 


Trivial  by  induetion  and  use  of  lemma  B.105. 

Lemma  C.15  (Substitution)  1.  //'T,  'T';  T,  a'  :  A:',  T'  h  x  :  A:  and  'T;  T  h  x'  :  A:'  then  'T,  'T';  T,  r'[x7a']  h 
x[x' /a'] :  k. 

2.  //'T,  'T';  I  r,  a' :  )t',  r  h  e  :  X  and 'T;  T  h  x' :  A:'  then  'T,  'T';  I;  T,  T'[x' /a']  h  efx'/a']  :  x[x7a']. 

3.  //'T,  'T';  I  r,  v' :  x',  r'  h  e  :  X  and 'T;  I;  T  h  e' :  x'  then  'T,  'T';  1;  T,  V  h  e[e' /x']  :  X. 

Easily  proved  by  struetural  induetion  on  the  typing  derivations. 

Let  us  now  proeeed  to  prove  the  main  preservation  theorem. 

Theorem  C.16  (Preservation)  //'•;r;»l-e:x,  q~r,  (q,e)  — )■  { /u'  ,  e'  )  then  there  exists  Z'  such  that 
I  C  I',  7  ~  I'  and  •;  I';  •  h  e' :  x. 

Proeeed  by  induetion  on  the  derivation  of  ( q  ,  e  )  — ;■  ( jj'  ,  e).  When  we  don’t  speeify  a  different  q',  we  have 
that  fj'  =  lu,  with  the  desired  properties  obviously  holding. 


^  {h,e)  — ,e) 

^'"(q,87])^(7,e[e1) 

By  induetion  hypothesis  for  ( q  ,  e  )  — )■  { fi'  ,  e’ )  we  get  a  Z'  sueh  that  Z  C  Z',  7  ~  Z'  and  •;  Z;  •  h  e' :  X.  By 
inversion  of  typing  for  £[7  and  re-applieation  of  the  same  typing  rule  for  £7]  we  get  that  •;  Z';  •  h  £[e']  :  x. 


Case(q,  (A(^).7r)^(q,  Mo.f?’)  > 

By  inversion  of  typing  we  get: 

•;  Z;  •hA(.^:).e:n(^).x' 

•  h  r 

^=71 0,1-7’ 

By  further  typing  inversion  for  A{K).e  we  get: 

•,  Z;  -h  7]oi  :x" 

For  a>i<  =  •,  r  we  have  •  h  (•,  T)  :  (•,  K)  trivially  from  the  above. 
By  lemma  C.  13  for  a>j<  we  get  that: 
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r;.hMo,i-r:x"-r. 

Now  it  remains  to  show  that  x”  -T  =  [x"J  q  j  T,  which  is  proved  by  C.  14. 

Case  (a*  ,  unpack  (r,  x)v  {.)x.{e) )  — {\e]^^^-T)[v/x] )  > 

By  inversion  of  typing  we  get: 

•  ;  I;  •h(r,  x'')v\Y.{K).x' 

•  ,  K;L;x:  [x']o,i  ^  \e'\i  :x 

•  ;  •  h  X  :  * 

By  further  typing  inversion  for  (T,  x")  v  we  get: 
x"  =  x' 

•  'tT  -.K 

•  [x'loi  :* 

.;r;.hv:rxno,i-(n 

First  by  lemma  C.13  for  e' ,  =  T  we  get: 

•  ■,L-,x:\x\,-Th\e\,-T:x-T. 

Second  by  lemma  C.12  for  x  we  get  that  x  •  T  =  x. 

Thus  •;  I;  X  :  [xHoj-T’  1“ 

Furthermore  by  lemma  C.15  for  [v/x]  we  get  that  •;£;•!-  ([e^o  i '  ^)[^A]  •  which  is  the  desired. 

Case  ( fj  ,  (kx  :  x.e)  v  )  — ^  ( /r ,  e[v/x]  )  > 

By  inversion  of  typing  we  get: 

•  ;  Z;  •  h  Xx  :  x.e  :  x'  — 7-  X 

•  ;  Z;  •  h  V  :  x' 

By  further  typing  inversion  for  Xx  :  x.e  we  get: 

•  ;  Z;  X  :  X  h  e  :  X 

By  lemma  C.15  for  [v/x]  we  get: 

•  ;  Z;  •  h  e[v/x]  :  X,  which  is  the  desired. 

Case  ( p  ,  proj,(vi ,  V2) )  — ^  ( A' ,  v; )  > 

By  typing  inversion  we  get: 

•;  Z;  •  h  (vi,  V2)  :  Xi  X  X2 
x  =  x,- 

By  further  inversion  for  (vi,  V2)  we  have: 

•;  Z;  •  h  V,- :  x,,  which  is  the  desired. 

Case  ( /j  ,  case(inj,-  v,  x.ei,  x.e2)  )  — ^  ( A'  >  ^i[v/x\  )  \> 

By  typing  inversion  we  get: 

•;  Z;  •  h  V  :  X; 

•;  Z;  X  :  x,-  F  e,- :  x 

Using  the  lemma  C.15  for  [v/x]  we  get: 

•;  Z;  •he, -[v/x]  :x 

Case  ( A/ ,  unfold  (fold  v))^(a/,  v)  > 

By  inversion  we  get:  •;  •  h  a/OC  :  k.x' :  k 
•;  Z;  •  h  fold  v  :  {jua  :  k.x')  Xi  X2  •  •  •  x„ 

X  =  x'liLia  :  k.x']  Xi  X2  •  ••  x„ 

By  further  typing  inversion  for  fold  v: 
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•;  Z;  •  I-  V  :  x'[jua  :  k.x/a]  Xi  X2  •  •  •  x„ 

Which  is  the  desired. 

I  fj 

Case  - - ^ - — - - - -  > 

( A/ ,  ref  V  )  — >  {{lu,  I  ^v)  ,  1) 

By  typing  inversion  we  get:  •;  Z;  •  h  v  :  X 

For  Z'  =  Z,  Z :  X  and  =  fi,  Z 1— )■  v  we  have  that  fi'  ~  Z'  and  •;  Z';  •  h  Z :  ref  X. 

^  ,  Z  :=  V  )  — >  ( A/[Z  v]  ,  0  ) 

By  typing  inversion  get: 

•;  Z;  •  h  Z :  ref  X 
•;  Z;  •  h  V  :  X 

Thus  for  /Li'  =  iLi[l  I— )■  v]  we  have  that  /r'  ~  Z  and  •;  Z;  •  h  ()  :  unit. 
it-^v  e/Li 

Case  - - ^ - ; - -  > 

{fi,  !Z)  — 

By  typing  inversion  get:  •;  Z;  •  h  Z :  ref  X 
By  inversion  of  /r  ~  Z  get: 

•  ;  Z;  •  h  V  :  X,  which  is  the  desired. 

Case  ( fj  ,  (Aa  :  k.e)  x"  )  — ^  ( fj  ,  e[x'' /a]  )  > 

By  typing  inversion  we  get: 

•;  Z;  •  h  Aa  :  k.e  :  ITa  :  k.x' 

•  ;  •  h  x"  :  Z: 

X  =  x'[x"/a] 

By  further  typing  inversion  for  Aa  :  k.e  we  get: 

•  ;  Z;  a  :  ^  h  e  :  x' 

Using  the  lemma  C.15  for  [x"/a]  we  get: 

•  ;  Z;  •  h  e[x"/a]  :  x'[x"/a],  which  is  the  desired. 

Case  ( /r ,  fix  X  :  x.e  )  — ^  ( /r ,  e[fix  x  :  x.e/x]  )  > 

By  typing  inversion  get: 

•  ;  Z;  X  :  X  h  e  :  X 

By  application  of  the  lemma  C.15  for  [fix  x  :  x.e/x]  we  get: 

•  ;  Z;  •he[fixx:x.e/x]  :x 


3avj/.(»  h  A  [r'Jg  -aii- =  r) 

O'SSC  _ ’  ^  ^ _ 

( ,  unify  T  return  (.x')  with  i-a  e) )  — ^  ( /r ,  injj  ( |"e']  ^  |^,|  •  avp) ) 

By  inversion  of  typing  we  get: 

•  h  r 

•  ,  iSf;  Z;  -h  [x'loi 

•  h  wf 

'^  =  (Mo.f^)  +  unit 

By  application  of  lemma  C.15  for  avj/  and  \e'^^Q  nj,/|  we  get: 
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•;  Z;  •  h  [e  ]  oj\j</|  •  CT>j(  :(|'x]qj-7’)  -  cjv[<. 

All  we  now  need  to  prove  is  that  [x']q  j  •  T  =  (  [x']q  j  •  T')  •  a>i<. 

Using  the  lemma  C.l  1  we  get  that: 

([xHon  •  T')-o^  =  rx'lo,!  •  {T'-o^)  =  \x'\,-T 

It  is  now  easy  to  eomplete  the  desired  result  using  the  typing  rule  for  injj. 

h  :  |''F]q  A  ["r'jQHp, -aii- =  T) 

O'Ssc _ ‘  ^  ^ _  P>, 

( /r ,  unify  T  return  (.x)  with  i^.T'  i-a  e) )  — >  ( /r ,  inj2  ()  ) 

Trivial  by  applieation  of  the  typing  rule  for  inj2. 

Lemma  C.17  (Canonical  forms)  If*',  £;•  h  v  :  x  then 

1.  Ifx  =  n(^).x^  then  3e  such  that  v  =  A{K).e. 

2.  Ifx  =  r(^).x',  then  3T,v'  such  that  v  =  pack  T  return  (.x")  with  v'  with  x'  =p  x". 

3.  Ifx  =  unit,  then  v  =  (). 

4.  Ifx  =  Xi  — )■  X2,  then  3e  such  that  v  =  Xjc  :  Xi.e. 

5.  Ifx  =  Xi  X  X2,  then  3vi,V2  such  that  v  =  {v\,  V2). 

6.  Ifx  =  Xi  +  X2,  then  3v'  such  that  either  v  =  inf  V  or  v  =  inj2  v'. 

7.  Ifx  =  (/ra  :  k.x')  X\X2  •  •  •  x„,  then  3v'  such  that  v  =  foid  v'. 

8.  Ifx  =  refx',  then  31  such  that  v  =  1. 

9.  Ifx  =  Aa  :  k.x',  then  3e  such  that  v  =  Aa  :  k.e. 

Directly  by  typing  inversion. 

Theorem  C.18  (Progress)  If  Z;  •  h  e  :  X  and  /r  ~  Z,  then  either  p,  e  — ?■  error,  or  e  is  a  value  v,  or  there  exist 
p'  and  e'  such  that  p,  e  — s-  p',  f . 

We  proceed  by  induction  on  the  typing  derivation  for  e.  We  do  not  consider  cases  where  e  =  v  (since  the  theorem 
is  trivial  in  that  case),  or  where  e  =  8[e']  with  e  f=v.  In  that  case,  by  typing  inversion  we  can  get  that  e'  is  well- 
typed  under  the  empty  context,  so  by  induction  hypothesis  we  can  either  prove  that  p,  e  — )■  error,  or  there  exist 
pi ,  e"  such  that  p,  £[e]  — )■  p' ,  8,[e"]  by  the  environment  closure  small-step  rule.  Thus  we  only  consider  cases 
where  e  =  £[v],  or  where  e  cannot  be  further  decomposed  into  £[e']  with  £  /=•.  Last,  when  we  don’t  mention  a 
specific  p! ,  we  have  that  p'  =  p  with  the  desired  properties  obviously  holding. 

•  ;  Z;  •hv:n(i^).x  •hT:K 

Case - — - — -  > 

.;Z;.hvr:rxlo,i-(r) 

By  use  of  the  canonical  forms  lemma  C.17,  we  get  that  v  =  A{K).e. 

By  typing  inversion  we  get  that  Z;  •  h  [e]Q  ^  :  x'. 

So  applying  the  appropriate  operational  semantics  rule  we  get  an  e'  =  [e]Q  ^  •  T  such  that  {p  ,  e)  — )■  {p  ,  e' ). 

•  ;  Z;  •  h  V  :  Z(^).x  •,  Z;  •,  x  :  |"x]o  ^  h  ^  :  x'  •;*l-x':* 

Case -  ’  — f -  > 

•  ;  Z;  •  h  unpack  v  [.)x.{e  )  :  x 

By  use  of  the  canonical  forms  lemma  C.17,  we  get  that  v  =  pack  T  return  (.x")  with  v' . 

Furthermore  we  have  that  0  1  well-defined,  so  such  will  be  [e']  q  j  •  T  too. 

Thus  the  relevant  operational  semantics  rule  applies. 
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•  ;  Z;  •  h  V  :  X  — )■  •;  Z;  •  h  e' :  x 

Case - ^ ^ -  > 

•  ;  Z;  •  h  V  e' :  x' 

From  canonical  forms,  we  have  that  v  =  Xv  :  x".e",  so  the  relevant  step  rule  applies. 

•  ;  Z;  •  h  e  :  Xi  X  X2  /  =  1  or  2 

Case -  > 

•  ;  Z;  •  h  proj;  e  :  X; 

From  canonical  forms,  we  have  that  v  =  (vi,  V2),  so  using  the  relevant  step  rule  for  proj,  we  get  that 
(f',  pro'he)  — ^  {^,Vi ). 


•  ;  Z;  •  h  V  :  Xi  +  X2  •;  Z;  •,  x  :  Xi  h  d  :  x  •;  Z;  •,  x  :  X2  F  ^2  :  x 

Case - ; - - -  > 

•  ;  Z;  •  h  case(v,  x.ei ,  x.e2)  '■  x 

From  canonical  forms,  we  have  that  either  v  =  injj  v'  or  v  =  inj2  v';  in  each  case  a  step  rule  applies  to  give  an 
appropriate  e' . 


Case 


h  pa  :  X.x  :  X 


*;  Z;  •  h  V  :  (pa  :  X.x)  Xi  X2  •  •  •  x„ 


> 


•  ;  Z;  •  h  unfold  v  :  x[pa  :  X.x/a]  Xi  X2  •  •  •  x„ 

From  canonical  forms,  we  get  that  v  =  fold  v',  so  the  relevant  step  rule  trivially  applies. 


Case 


*;  Z;  •  h  V  :  X 


»;  Z;  •  h  ref  V  :  ref  x 
Assuming  an  infinite  heap,  we  can  find  a  I  such  fhaf  /  0  p,  and  consfrucf  p' 
rule  applies  giving  e'  =  1. 


p,  / 1— )■  V.  Thus  fhe  relevanf  sfep 


•  ;  Z;  •  h  V  :  ref  X 

Case -  > 


•;  Z;  •  hlv  :  x 

From  canonical  forms,  we  gel  fhaf  v  =  Z.  By  fyping  inversion,  we  gel  lhal  (Z :  x) 
From  p  ~  Z,  we  gel  lhal  Ihere  exisls  v'  such  lhal  (Z 1— ;■  v')  G  p. 

Thus  Ihe  relevanl  step  rule  applies  and  gives  e'  =  v' . 


GZ. 


•  ;  Z;  •  h  V  :  ITa  :  k.x  •;  •  h  x  :  X 

Case - tttg - 

•  ;  Z;  •  h  vx  :  X  [x/a] 

From  canonical  forms,  we  gel  lhal  v  =  Aa  :  k.e.  The  relevanl  step  rule  Irivially  applies  lo  give  e'  =  e[x/a]. 


•  ;  Z;  •,  X  :  X  h  e  :  X 

Case -  > 

•  ;  Z;  •  h  fixx  :  x.e  :  x 

Trivially  we  have  lhal  Ihe  relevanl  step  rule  applies  giving  e'  =  e[fix  x  :  x.e/x]. 


Case 


h  ['F'Jq  wf 


•  ^T  -.K 


(xloi  :  * 


•  ;  Z;  •  h  unify  T  return  (.x)  with  (*F'.  T'^e)  :  ([x]oj-(r))  +  unlt 
We  have  non-determinism  here  in  the  semantics,  which  we  will  fix  in  the  next  section,  giving  more  precise 
semantics  to  the  patterns  and  unification  procedure.  In  either  case,  we  split  cases  on  whether  an  with  the 
desired  properties  exists  or  not,  and  use  the  appropriate  step  rule  to  get  an  e'  in  each  case. 
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D.  Typing  and  unification  for  patterns 

D.l  Adjusting  computational  language  typing 

First,  we  will  define  two  new  notions:  one  is  a  stricter  typing  for  patterns,  allowing  only  certain  forms  to  be 
used;  the  second  is  relevant  typing  for  patterns,  making  sure  that  all  declared  unification  variables  are  actually 
used  somewhere  inside  the  pattern.  Together  they  are  supposed  to  make  sure  that  unification  is  possible  using  a 
decidable  deterministic  algorithm;  so  there  is  only  one  unifying  substitution,  or  there  is  none. 

We  change  the  pattern  matching  typing  rule  for  the  computational  language  as  follows: 

'¥hT:K  'F hp  ['F']  1^1  wf 

'F,  ['F']  1^1  hp  \r]  1^1  1^, I  :  K  relevant  ('F,  ['F']  hp  \r]  :  ^)  =  ^F,  ['F'] 

'F,  ['F']|^|;r;rh  [T'] 

*F;  Z;  r  h  unify  T  return  (.x)  with  (*F'.r'  ^  e')\  { [x]  nj,|  j  •  (id>i<,  T))  +  unit 

Then  we  define  the  stricter  typing  for  patterns  hp.  This  will  be  entirely  identical  to  normal  typing,  but  will 
disallow  forms  that  would  lead  to  non-determinism  (e.g.  context  unification  variables  allowed  anywhere  inside 
a  pattern). 

Then  we  define  the  notion  of  relevancy  for  extension  variables.  For  a  judgement  *F;3,  relevant  ('F;0)  =  *F, 
where  *F  is  a  partial  context,  containing  only  the  extension  variables  that  actually  get  used.  We  will  show  that 
functions  used  during  typing  and  evaluation  commute  with  this  function. 

Then,  we  prove  that  either  a  unique  unification  exists  for  a  pair  of  a  pattern  and  a  term,  yielding  a  partial 
substitution  for  the  relevant  variables,  or  that  no  such  unification  exists.  From  this  proof  we  derive  an  algorithm 
for  unification. 

D.2  Strict  typing  for  patterns 

Definition  D,1  (Pattern  typing)  We  will  adapt  the  typing  rules  for  extended  terms  T,  to  show  which  of  those 
terms  are  accepted  as  valid  patterns.  We  assume  that  the  *F  is  split  into  two  parts,  *F,  *F„,  where  *F„  contains 
only  newly-introduced  unification  variables  just  for  the  purpose  of  type-checking  the  current  pattern  and  branch. 


'Fhp  'F^  wf 

'Fhp'F^wf  'F, 'F„hp[<F]r:[<F]5 

»Fhp»F„wf  »F,  »F„hp<Fwf 

'Fhp. 

wf 

'Fhp  ('F^,  [<F]0wf 

'Fhp  (%,  [<I>]ctx)  wf 

'F,  Wu  hp  T 

:K 

W,Wu,<^hpt:t'  :s 

'F,  'F^  hp<I>,  <F'wf 

'F,  'F^  hp  [^]f.[^]d 

W,  »F„  hp  [<F]<I>' :  [<F]ctx 

'F,  Wu  hp<I>wf 

'F,  'F^  hp  .  wf 


'F, 'F.hpOwf  'F, 'F„;<I>hpr:^ 
'F,  'F^  hp  (<F,  t)  wf 


'F,  »F„  hp  <I>  wf  (»F,  »F„) ./  =  [<I>]  ctx  /  <  |»F| 
'F,  %  hp  (<F,  Xi)  wf 


'Fhp<I>wf  ('F, 'F„)./=  [<I>]ctx  />|'F| 
'F,  'F,  hp<I>,A,  wf 
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'P, <!>  h„  a  :  <!>'  'F, <I>  U  f :  /  a 


'P,  'i'u,  <!>  hp  a  :  <!>'  ('F,  '¥u).i  =  [<!>']  ctx  <!>',  X;  C  <I> 

(a,  id(XO)  :  (O',  X,) 


129 


c-.te'L  ^\  =  t  (5,5') 

'P,  ^>u\^^pc-.t  'P,  'i'u,  <^hpfi:t  'i>u,<^hps:s' 


'¥,'i>u,<^hpti:s  'i>,'i>u,<^,tihp\t2]\^\:s  {s,s',s")£Jl 


, 'P„;Ohpn(A)T2 

// 

:  ^ 

0  hp  fi  : 

:5  'P, 'P„;0,  fiHp  rf2l|<i,| 

Ohpn(ri).[r'J 

'V,  ^>u\ 

OhpMA)T2:n(A).[/J|^l,. 

'V,  'P„;Ohp 

t2-t 

;  [f'lioi 

•  (id<i)A2) 

;  0  hp  A  :  f 

'P,  %;Ohpf2:f 

'i',  'i'u 

;  0  hp  f :  Type 

*P„;  0  hp  A  =  f2  : 

Prop 

('P,  'P 

u).i  =  T  T-- 

=  [0']f'  /<|'F| 

'F,  'P 

Ohpa:0' 

^u,^^pXi/a:t'-a 

{'¥,'¥u)-i  =  T  T  =  [^']t'  />|'P|  'F, <I>  hp  a  :  O'  O' C  O  a  =  ido' 

'P,  'i>u,<^hpXi/a:t'-a 

'¥,'i>u,<^hpt:h  *y»;Ohp?i:Prop  '¥, '¥u,  <^hp  t' :  h  =  t2 

'P,  %;  Ohp  comtt’  :t2 

>y»;Ohpfi=fi:Prop  O :  fi  =  fz 

'P,  ^u,  ^  ^p  refi  h:ti=  t[  'F,  O  \-p  symm  :  f2  =  h 

O  hp  fa  :  fi  =  f2  *1^,  'i^u,  O  Hp  ffo  :  f2  =  fa 
'P,  'Py;  O  \-p  trans  ta  tb:h=  I3 

Ohp?,:Mi  =M2 

OhpMi  :A^fi  >y,  O  hp  : M  =  A^2  *y,  O  hp M  :  A 

'P,  'P„;  O  hp  congapp  f/,  :MiNi=  M2  N2 

*P,  Ohpffl  :Ai  =A2 

O, Ai  hp  [fj,]  :  fii  =  g2  *y,  *y»;  O  hp  Ai  :  Prop  O, Ai  hp  [gi]  :  Prop 

'P,  ^u,  ^  Hp  congimpi  ta  (k{Ai).tb)  ■  n(Ai).  [BiJ  =  n(A2).  [^2] 

'V,  ^>u\  o,  A  hp  \tb\  :B  =  B'  %;  O  hp  n(A).  [gj  =  n(A).  [g'J  :  Prop 
^u,  O  hp  congpi  iXiA).tb)  :  n(A).  [Bj  =  n(A).  [B'J 

O,  A  hp  \tb]  :Bi=B2  '¥,  '¥u,  O  hp  X(A).  [BiJ  =  X(A).  [^2]  :  Prop 
'P,  ^u,  ^  ^p  conglam  {X{A).tb)  ■  X{A).  [BiJ  =  X(A).  [B2J 

»P„;OhpX(A).M:A^B  »F„;  O  hp  :  A  »F„;  O  hp  A  ^  B  :  Type 

Ohpbeta  (X(A).M)A^i^^(A).M)A^=  [M]  •  (id<i.,A^) 


Now  we  need  to  prove  that  all  theorems  that  had  to  do  with  these  typing  judgements  still  hold.  In  most  cases, 
this  holds  entirely  trivially,  since  the  \-p  judgements  are  exactly  the  same  as  the  h  judgements,  with  some  extra 
restrictions  as  side-conditions.  The  only  theorems  that  we  need  to  reprove  are  the  ones  that  require  special  care 
in  exactly  those  rules  that  now  have  side-conditions.  As  these  rules  all  have  to  do  just  with  the  use  of  extension 
variables,  we  understand  that  the  theorems  that  we  need  to  adapt  are  the  extension  substitution  lemmas.  Their 
statements  need  to  be  adjusted  to  account  for  part  of  the  substitution  corresponding  to  the  part,  and  part  of 
it  corresponding  to  the  part  (both  in  the  source  and  target  extension  contexts  of  the  substitution).  Though 
we  do  not  provide  the  details  here,  the  main  argument  why  these  continue  to  hold  is  the  following:  we  never 
substitute  variables  from  with  anything  other  than  the  same  variable  in  a  context  that  includes  the  same 
Thus  the  side-conditions  will  continue  to  hold. 

Theorem  D.2  (Extension  of  lemma  B. 97)  (a>i<,A|>j(/|,  •••  :  (*T,  *T„)  then: 

1.  //*T,  ^hpf.t'  then  'T',  \-p  t  ■  :  t'  ■  avp. 

2.  //'T,  ^>u,^hpa:^'then^'',  <I>  •  hp  a  •  avp  :<!)'•  a^. 

3.  //'T,  hp  <f>  wfthen  'T',  hp  <I>  •  wf 

4.  If^,  ^u^pT  :K  then  'T',  hpT :  K 


In  all  cases  proceed  entirely  similarly  as  before.  The  only  special  cases  that  need  to  be  accounted  for  are  the 
ones  that  have  to  do  with  restrictions  on  variables  coming  out  of 

'Thp<I>wf  ('T, 'T„)./=  [<I>]ctx  />|'T| 

Case -  > 

'T,  Hp  <I>,A,  wf 

We  need  to  prove  that  hp  <f>  •  a(p,  A,-  •  a(j,  wf,  where  =  axp,  Anj//|,  •  •  •  ,  By  induction 

hypothesis  for  =  •  we  get  that  hp  <I>  •  avp  wf,  and  because  of  lemma  B.92  we  get  that  also  hp  <I>  •  a(j,  wf. 
Also,  we  have  that  A,  •  a(j,  =  A,_nj<|_|_nj</|. 

We  have  that  ('T',  %).i  -  |'T|  -f  |'T'|  =  [<I>  •  a(p]  ctx. 

Last,  since  i  >  |*T|,  we  also  have  that  i  —  |*T|  -|-  |'T'|  >  |'T'|. 

Thus  by  the  use  of  the  same  typing  rule,  we  arrive  at  the  desired. 

^  {^,^u).i  =  T  T  =  [^y  />|»T|  <D  hp  a  :  <D'  a  =  id^/ 

Case - ; -  > 

'T,  'T„;<I>hpA,/a:t'-a 

Similarly  as  above.  Furthermore,  we  need  to  show  that  idcj)'  •  a(j;  =  id<j>/.(j;j,,  which  is  simple  to  prove  by  induction 
on  <!>'. 

Lemma  D.3  (Extension  of  lemma  B.98)  hp  wfthen  hp  •  a>j<  wf. 

Similarly  to  lemma  B.98  and  use  of  the  above  lemma. 

D.3  Relevant  typing 

We  will  proceed  to  define  a  notion  of  partial  contexts:  these  are  extension  contexts  where  certain  elements 
are  unspecified.  If  is  presumed  fhaf  in  fhe  judgemenfs  fhaf  fhey  appear,  only  specified  elemenfs  are  relevanf;  fhe 
judgemenfs  do  nof  depend  on  fhe  ofher  elemenfs  af  all  (save  for  fhem  being  well-formed).  We  will  use  fhis  nofion 
in  order  fo  make  sure  fhaf  all  unificalion  variables  infroduced  during  paffern  mafching  are  relevanf.  Ofherwise, 
fhe  irrelevanf  variables  could  be  subslifufed  by  arbifrary  ferms,  resulting  in  fhe  exisfence  of  an  infinile  number 
of  valid  unification  subsfifufions. 

Definition  D.4  The  syntax  for  partial  contexts  is  defined  as  follows. 
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Definition  D.5  Well-formedness  for  partial  contexts  is  defined  as  follows. 
wf 

h  $  wf  $  h  [<I>]  f  :  [<I>]  5  h  'F  wf  'F  h  O  wf  h  $  wf 

H  •  wf  h  ($,  [<I>]  t)  wf  h  ($,  [<I>]  ctx)  wf  h  ($,  ?)  wf 

Other  than  the  above  ehange  in  the  well-formedness  definition,  partial  eontexts  are  used  with  entirely  the 
same  definitions  as  before.  This  means  that  if  a  typing  judgement  like  <I>  h  t  :  t'  tries  to  aeeess  the  /-th 
metavariable,  this  metavariable  should  be  speeified  in  rather  than  being  the  unspeeified  element  ?  -  beeause 
the  side-eondition  W.i  =  K  would  otherwise  be  violated. 

We  proeeed  to  define  a  judgemenf  fhaf  exfraefs  fhe  relevanf  exfension  variables  ouf  of  typing  Judgemenfs  fhaf 
use  a  eonerefe  eonfexf  ty,  yielding  a  partial  eonfexf  ty.  We  firsl  need  a  eouple  of  definilions. 

Definition  D.6  The  fully-unspecified  partial  context  is  defined  as  follows. 


unspec>j( 

unspec,  =  • 

unspeciji  =  unspec>j(,  ? 


Definition  D.7  The  partial  context  specified  solely  at  i  is  defined  as  follows. 
W@i 

(ty,  K)@i  =  unspec>j(,  K  when  |ty|  =  i 
('T,  K)@i  =  ?  when  |ty|  >  i 


Definition  D.8  Joining  two  partial  contexts  is  defined  as  follows. 


tyoty' 


(ty,  K)o{W',K) 
(ty,  ?)o(ty',  K) 

(ty,  ?) 

($,  ?)o(ty',  ?) 


(tyoty'),  K 
(tyoty'),  K 
(tyoty'),  K 
(tyoty'),  ? 


Definition  D.9  The  notion  of  one  partial  context  being  a  less  precise  version  of  another  one  is  defined  as  follows. 


•  IZ  • 

($,  K)  □  (ty",  K)  <=  ty  □  ty" 
(ty,  ?)  □  ($',  K)  ^  ty  □  ty" 
(ty,  ?)  □  (ty',  ?)  ^  ty  □  ty" 
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Definition  D.IO  'We  define  a  judgement  to  extract  the  relevant  extension  variables  out  of  a  context. 


relevant  h  r  :  ^)  = 


relevant 


'W\^ht\t'  W-,^ht': 


relevant 


Wh  [^]t :  [<!>]?' 

'F  h  <I>,  O'  wf 
'P  h  [<I>]0' :  [O]  ctx 


=  relevant  ('P;  O  h  r :  f') 

/ 

=  relevant  (*F  h  O,  O'  wf) 


relevant  (*P  h  O  wf)  = 


/'FhOwf  '¥-<^ht:s\ 

relevant  I  -r- - ;  1  =  unspecij,  relevant  - - - - -  =  relevant  (*F;  O  h  f :  5) 


'Fh.wf 


W  h  (O,  t)  wf 


/'FhOwf  m./  =  [o]ctx\  ,  .  .  ^ 

relevant  -  =  relevant  (*F  h  O  wf)  o  ('F@/) 

\  'F  h  (O,  Xi)  wf 
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relevant  (*F;  <I>  h  r :  f')  = 


/c:fGr\  <^.i  =  t  \ 

relevant  -  =  relevant  (*F  h  <I>  wf)  relevant  -  =  relevant  h  <I>  wf) 

y'F;<I>hc:ry  ^  ^  ^  ’ 

(  {s,s)^A\ 

relevant  - ^  =  relevant  (*F  h  <I>  wf) 

\  I  s  •  s  j 

'^■^htr.s  'P;  <!>,  fi  h  rf2l|ci>|  {s,s',s”)eJl 

^  ^hn{h).t2--s"  _ 

'F;  <I> h  n(ri).  [r'J  : 


relevant 


=  relevant  ( *F;  <I>,  ti  h  |'f2l  |<j)|  :  s' 


relevant 


'P;<I>hX(ri).r2:n(ri).[r'J  . 


relevant  ( *F;  <I>,  ti  h  |'r2l  loi  :  t' 


relevant 


"'F;  Ohfi  :n(0.f'  'F;<I>hr2:f' 


=  relevant  (*F;  <I>  h  fi  :  n(r )  .r')  o  relevant  (*F;  <I>  h  r2  :  0 


'P;<I>hr2:f  'F;  <I>hr  :Type' 


relevant 


relevant  (*F;  <I>  h  fi  :  f)  o  relevant  (*F;  <I>  h  r2  :  0  relevant 


*F;  <I>  h  fi  =  f2  :  Prop  J 

\'¥).i  =  T  r  =  [<!>']?'  <I>ha  :<!>'' 


'¥-<^hXi/a:t'a 


I'Pj— i  times 

(relevant  ('Ft;h  [O']  t' :  [O']  s) ,  VV^)  o  relevant  (r^;  O  h  a  :  O')  o  (»F@/) 


relevant 


relevant 


''P;Ohr:ri  'F;Ohri:Prop  rP;  O  h  r' :  fi  =  f2' 
*F;  O  h  conv  1 1'  :t2 

\  / 

relevant  (*F;  O  h  r :  fi)  o  relevant  (*F;  O  h  f' :  fi  =  12) 

'P;Ohri:r  rp;  Ohfi  =ri  :  Prop\ 


relevant 


*P;  O  h  refi  fi  :  ri  =  h 


O  h  :  ?i  =  ?2 

*F;  O  h  symm  ta'-t2  =  t\ 


=  relevant  (*F;  O  h  fi  :  r) 


=  relevant  (*F;  O  h  :  fi  =  (2) 


relevant 


*F;  O  h  :  fi  =  r2  *F;  O  h  :  f2  =  ^3 


relevant 


rfr;  Ohtransrflffoiri  =f3  J 
relevant  (*F;  O  h  :  fi  =  f2)  o  relevant  (*F;  ^ 'r  tb '■  t2  =  h) 
'‘P;  Ohf„  :Mi  =M2  'F;OhMi:A^B  tt '.Ni  =  N2 


O  h  M  :  A 


*F;  O  h  congapp  ta  h  '.MiNi=  M2  N2 
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relevant  (*1^;  O  h  :  Mi  =  M2)  o  relevant  (*1^;  O  h  M  =  N2) 


relevant 


'F;  <I>hfa  :Ai  =A2  'F;  <I>,Ai  h  :Bi  =^2  'F;  <I>  h  Ai  :  Prop  'F;  <I>,Ai  h  [Bi]  :  Prop 

'F;  O  h  congimpi  fa  (?t(Ai).fz,)  :  n(Ai).  [BiJ  =  n(A2).  [^2] 

relevant  (*F;  <I>  h  :  Ai  =  A2)  o  relevant  (*P;  O, Ai  \-  \tb  \  '■  B\  =  B2) 

'F;*!),  Ah  \tb  \  :B  =  B'  'F;  <I>  h  n(A).  [Bj  =n(A).[B'J  :  Prop\  _ 


relevant 


relevant 


'F;  <!>  h  congpl  {X{A).tb)  :  n(A).  [Bj  =  n(A).  [B'J  J 

relevant  (*F;  O,  A  h  \tt\  :  B  =  B') 

''¥■  <!>,  A  h  \tb]  :  Bi  =  B2  'F;  <!>  h  X(A).  [BiJ  =  X(A).  [B2J  :  Prop\ 


relevant 


<I>  h  conglam  (k{A).tb)  '■  ^(A).  [BiJ  =  X(A).  [B2J 

relevant  (*F;  <I>,  A  h  \tb]  :  Bi  =  B2) 

''P;  <I>hX(A).M:A^B  'F;<I>hAf:A  'F;  O  h  A  ^  B  :  Type 
^  'P;  <I>  h  beta  (X(A).M)  N  :  (k{A)M)  N=\M'\-  (ld<i>,A^) 

relevant  (*F;  <I>  h  X{A).M :  A  — ^  B)  o  relevant  (*F;  <I>  h  Af :  A) 


} 


'P;  O  h  a  :  <!>' 


relevant 


'P;  Oh 


=  relevant  (*F  h  O  wf) 


relevant 


''P;  Oh  a:  O'  'F;Ohf:f 
'F;  O  h  (a,  t)  :  (O',  t') 


relevant  (*F;  O  h  a  :  O')  o  relevant  (*F;  O  h  f :  f'  •  a) 


relevant 


''F;  O  h  a  :  O'  {'¥).i  =  [O']  ctx  O',  X,  C  o' 
'F;Oh(a,  id(X,-)):(0',X,-) 


relevant  (*F;  O  h  a  :  O') 


Lemma  D.ll  (More-informed  contexts  preserve  judgements)  Assuming  'L  C  'P'.- 

1.  If  ^hT:K  then  ^  T  :  K. 

2.  //'L  h  O  wfthen  h  O  w/ 

3.  //'L;  O  h  f  :  f'  then  O  h  f  :  t' . 

4.  //'L;  O  h  a  :  O'  then  O  h  a  :  O'. 


Simple  by  structural  induction  on  the  judgements.  The  interesting  cases  are  the  ones  mentioning  extension 
variables,  as  for  example  when  O  =  O',  or  f  =  X,/a.  In  both  such  cases,  the  typing  rule  has  a  side  condition 
requiring  that  *T./  =  T .  Since  C  *T',  we  have  that  *T'./  =  T . 

Lemma  D.12  (Relevancy  is  decidable)  1.  If^  \-T  \K,  then  there  exists  a  unique  such  that  relevantifV  \-  T  \  K) 

2.  /f'T  h  O  w/  then  there  exists  a  unique  T'  such  that  relevant  {'¥  h  O  wf)  = 

3.  If^',  O  h  f  :  f',  then  there  exists  a  unique  'T'  such  that  relevantifU',  O  h  f  :  f')  =  T'. 
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4.  <I>  h  a  :  <!>',  then  there  exists  a  unique  such  that  relevant  <I>  h  a  :  <!>')  = 

The  relevancy  judgements  are  defined  by  structural  induction  on  the  corresponding  typing  derivations.  It  is 
crucial  to  take  into  account  the  fact  that  h  'T  wf  and  h  <I>  wf  are  implicitly  present  along  any  typing  derivation 
that  mentions  such  contexts;  thus  these  derivations  themselves,  as  well  as  their  sub-derivations,  are  structurally 
included  in  derivations  like  <I>  h  f  :  f'.  Furthermore,  it  is  easy  to  see  that  all  the  joins  used  are  defined,  since 
in  most  cases  two  results  of  the  relevancy  procedure  on  a  judgement  using  the  same  context  are  joined,  which 
is  always  well-defined.  The  only  case  where  this  does  not  hold  (use  of  extension  variables  in  terms),  the  joins 
are  still  defined  because  of  the  adaptation  of  the  resulting  'T  by  affixing  the  unspecified  elements. 

Lemma  D.13  (Properties  of  context  join)  1.  0*^2  E 

2.  ‘PioT'2  E'P2 

3.  $ioT'2  =  'P2o'Pi 

4.  ($1  o  $2)  o  $3  =  $1  o  ($2  o  ^3) 

5.  //T'l  E  *^2  then  o  »P2  =  ^2 

6.  //'Ll  E  'P'l  then  'Ll  o  »P2  E  ‘P'l  o  ^2 

All  are  simple  to  prove  by  induction. 

Lemma  D.14  (Relevancy  when  weakening  the  extensions  context)  1.  If^  \-T  -.K,  then  relevant  h  T  :K) 

relevantly  FT  :  K)  ,1,  -  ■  ■  ,1. 

\'i”\ 

2.  \-  ^wf,  then  relevantly ,  'P'  h  wf)  =  relevanty  h  <I>  wf)  ,?,•••  ,  ?. 

I^p'l 

2.  //'P;  <I> h  r :  t\  then  relevanty,  'P';  <F  h  f :  P)  =  relevanty-,  ^hf.t') , 

I'r'i 

4.  <I>  h  a  :  <!>',  then  relevanty,  'P';  <I>  h  a  :  <!>')  =  relevanty-,  <I>  h  a  :  <!>') ,  ?7^- 

Simple  to  prove  by  induction. 

Lemma  D.15  (Relevancy  of  sub-judgements  is  implied)  l.(a)  relevanty  h  <I>  w/)  E  relevanty  h  <F,  wf) 

(b)  relevanty  h  <I>  wf)  E  relevanty-,  <f>  h  r :  P) 

(c)  relevanty  h  <I>  wf  E  relevanty-,  <f>  h  a  :  <!>'). 

2.  (a)  If  *P;  <I>  h  f :  P  then  relevanty-,  <f>,  <!>'  h  f :  P)  =  relevanty-,  <I>  h  r :  P)  o  relevanty  F  <I>,  <!>'  wf. 

(b)  If  *P;  <I>  F  a  :  <!>'  then  relevanty-,  <F,  <f>"  F  a  :  <!>')  =  relevanty-,  <I>  F  a  :  <!>')  o  relevanty  F  <I>,  <f>"  wf. 

3.  (a)  If  'P;  <I>  F  f :  P  and  *P;  <f>  F  P  :  5  then  relevanty-,  <I>  F  P  :  5)  E  relevanty-,  F  f :  P). 

(b)  If  *P;  <!>'  F  a  :  <I>  then  relevanty  F  <I>  w/)  E  relevanty-,  <!>'  F  a  :  <f>). 

4. (a)  If  'P;  <!>  F  f  :  P  and  *P;  <!>'  F  a  :  <I>,  then  relevanty-,  <f>'  F  ?  •  a  :  P  •  a)  E  relevanty-,  <f>  F  f :  P)  o 

relevanty-,  <!>'  F  a  :  <I>). 

(b)  If  *P;  <!>'  F  a  :  <I>  and  *P;  F  o'  :  <!>',  then  relevanty-,  <I>"  F  a  •  a' :  <I>)  E  relevanty-,  <!>'  F  a  :  <I>)  o 
relevanty-,  <I>"  F  a' :  <f>'). 

Part  1(a)  Trivial  by  induction  the  derivation  of  relevancy. 
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Part  1(b)  By  inversion  of  the  derivation  of  relevant  h  <I>  wf)  =  In  the  base  cases,  this  is  directly  proved 
by  the  relevancy  judgement;  in  the  case  where  we  have  relevant  (*B;  <I>  h  t  :  t')  =  relevant  ('B,  <I>,  ti\-t2:  h), 
by  induction  hypothesis  get  that  relevant  h  <I>,  ti  wf),  which  by  inversion  gives  us  the  desired;  in  the 
metavariables  case  trivially  follows  from  repeated  inversions  of  relevant  (*B;  <I>  h  a  :  O'). 

Part  1(c)  Trivial  by  induction  and  use  of  part  1(b). 

Part  2  By  induction  on  the  typing  derivations  of  t  and  t'  all  cases  follow  trivially. 

Part  3(a)  By  induction  on  the  derivation  of  O  h  t  :  t'. 

Case  t  =  c  \>  Simply  using  the  above  parts  and  the  fact  that  •\-t':s,  we  have  that  relevant  (*T;  O  h  t'  :  5)  = 
unspec>j(0  relevant  (*T  h  O  wf)  =  relevant  (*T  h  O  wf)  C  relevant  (*T;  O  h  t :  t'). 

Case  t  =  s  \>  Similarly  as  the  above  case. 

Case  t  =  vi  \>  We  have  that  Otih  O.I :  s,  by  inversion  of  the  well-formedness  derivation  for  O.  Therefore 
relevant  (*T;  O  h  t' :  5)  =  relevant  (*T;  Otih  t'  \s)o  relevant  (*T  h  O  wf).  By  repeated  inversion  of  h  O  wf 
we  get  that  relevant  (*T  h  (01,i,  O.I)  wf)  C  relevant  (*T  h  O  wf).  Thus  we  have  that  relevant  (*T;  O  h  t' :  5)  □ 
relevant  (*T  h  O  wf),  which  proves  the  desired. 

Case  t  =  n(ti).t2  >  Trivially  from  the  fact  that  relevant  (*T;  O  h  5"  :  s'”)  =  relevant  (*T  h  O  wf)  □ 
relevant  (*T;  h  |'t2l) 

Case  t  =  ?i(ti).t2  >  We  have  that  relevant  (*T;  O  h  n(ti).  [t'J  :  5')  =  relevant  (*T;  O,  h  [[t'J]  :  s)  = 
relevant  (*T;  O,  t\\-  t'  \  s).  So  by  induction  hypothesis,  since  <1>,  h  t'  :  5  is  a  sub-derivation  in  'T;  <I>,  t\  h 
[t2l  :  t' ,  we  have  that  relevant  (*T;  <I>,  h  t' :  5)  C  relevant  (*T;  <I>,  h  t2  :  t'),  which  is  the  desired. 

Case  t  =  tit2  >  By  induction  hypothesis  get  that  relevant  (*T;  <1>  h  Yl{t).t' :  5)  C  relevant  (*T;  <I>  h  :  n(t).t'). 
(Here  we  assume  unique  typing  for  IT  types).  Furthermore,  we  have  that  relevant  (*F;  <I>  h  n(t).t' :  5)  = 
relevant  (*F;  <I>,  t  h  [?']  :s).  Otherwise,  it  is  simple  to  prove  that  relevant  (*T;  <I>I-  (Ido,  12)  '■  [1^))  = 

relevant  (*F;  <I>  h  t2  :  \t') ),  thus  the  desired  follows  trivially  following  part  4. 


times 

Case  t  =Xi  jo  \>  We  have  that  relevant  (*F;  <!>'  h  t' :  5)  =  relevant  <!>'  h  t' :  5) ,  ?,•••,?  from  inversion 
of  well-formedness  for  Furthermore,  we  have  that  *F;  <I>  h  a  :  <!>'  from  typing  inversion.  Thus,  using  part 

times 


4,  we  get  that  relevant  (*F;  <I>  h  t'  •  a  :  5)  C  (relevant  (*F  <!>'  h  t' :  s) 

the  result  follows  directly,  taking  the  properties  of  join  into  account. 


7  .. 


>  relevant  (*F;  <I>  h  a  :  <!>').  Thus 


Case  t  =  {t\  =  12)  >  Trivial. 

Case  t  =  conv  1 1'  \>  By  induction  hypothesis  we  get  that  relevant  (*F;  <I>  h  =  t2  :  Prop)  IT  relevant  (*F;  <I>  h  T  :  =12). 

By  inversion  of  relevancy  for  t\  =  t2  we  get  that  it  is  equal  to  relevant  (*F;  <I>  h  ti  :  Prop)  o  relevant  (*F;  <I>  h  t2  :  Prop)  T 
relevant  (*F;  <I>  h  t2  :  Prop).  Thus  the  desired  follows  trivially  using  the  properties  of  joining  contexts. 


Case  (rest)  >  Following  the  techniques  used  above. 

Part  3(b)  By  induction  on  the  derivation  of  *F;  <!>'  h  a  :  <I>. 
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Case  a  =  •  >  Trivial. 


Case  a  =  o',  t'  \>  By  induction  hypothesis  for  o' ,  use  of  part  3(a)  for  t' ,  and  definition  of  relevancy  for  <I>. 

Case  o  =  o' ,  id(X,)  >  By  induction  hypothesis  for  o' ,  and  also  using  the  side  condition  for  Xi  being 
part  of  <!>':  by  inversion  of  well-formedness  for  <!>',  we  get  that  '¥@i  C  relevant (*TI- <!>' wf)  and  thus  also 
*T@/  [I  relevant  (*T;  <!>'  h  a  :  <I>),  proving  the  desired. 

Part  4(a)  By  induction  on  the  typing  derivation  for  t. 

Case  t  =  c  >  We  have  that  relevant  (*T;  ^\-  c  \t')  =  relevant  (*T  h  <I>  wf),  and  relevant  (*T;  <!>'  h  c  •  a  :  t'  •  a)  = 
relevant  (*T  h  <!>'  wf).  We  need  to  show  that  relevant  (*T  h  <I>  wf)  C  relevant  (*T  h  <!>'  wf)  o  relevant  (*T;  <!>'  h  a  :  <I>). 

We  have  that  relevant  (*T  h  <!>'  wf)  C  relevant  (*T;  <!>'  h  a  :  <I>),  so  the  join  in  the  above  equality  is  well-defined; 
from  the  properties  of  join  it  is  evident  that  it  is  enough  to  show  relevant  (*TI-  <I>  wf)  C  relevant  (*T;  <!>'  h  a  :  <I>) . 

This  is  trivially  proved  by  part  3(b). 

Case  r  =  5  >  Similarly. 

Case  t  =  f\  >  We  have  that  relevant  (*T;  ^'  \-  f\-o  -.t'  -o)  =  relevant  (*T;  O'  h  o.i :  t'  •  o).  By  inversion  for  o, 
we  have  that  relevant  (*T;  O'  h  o.i :  O./  •  o)  IT  relevant  (*T;  O'  h  a  :  O).  Thus  the  desired  directly  follows. 

Case  t  =  Yl{ti).t2  >  By  induction  hypothesis  for  [r2l  and  o  =  o,  we  get  that: 

relevant  (*T;  O'Ti  -cj  H  |'t2l  •  (cr,  /j<j)|)  :  /')  T  relevant  (*T;  O',  fi  -  a  h  |'f2l  :  5")  o relevant  (*T;  O',  -  a  h  (a,  /j<j>|)  :  (O,  fi)). 

Also  we  have  that  relevant  (*T;  O'  h  a  :  O)  T  relevant  (*T;  O',  -  a  h  a  :  O).  Using  the  known  properties  of 

freshening  and  substitutions,  we  know  that  relevant  (*T;  O' h  r  •  a  :/')  =  relevant  (*T;  0',ri-al-  [r2l  •  (cr,  :s"), 
thus  this  is  the  desired. 

Case  t  =  ?i(ti).t2  >  Similar  to  the  above. 

Case  t  =  tit2  \>  By  induction  hypothesis  we  get  that: 

relevant  (*T;  O'  h  fi  -  a  :  n(r  •a).(r'  -  a))  □  relevant  (*T;  O  h  :  n(r).r')  o  relevant  (*T;  O'  h  a  :  O),  and  that 
relevant  (*T;  <t>'  \-  t2-o  :  t  -o)  U  relevant  (*T;  O  h  r2  :  0  o  relevant  (*T;  O'  h  a  :  O).  Furthermore,  we  have  that 
relevant  (*F;  O'  h  ri  •af2  -CJ :  \t'  -o]-  (ld<i,/,  t2-o)) 

=  relevant  (*F;  O'  h  •  a  :  n(r  •  o).{t'  •  a))  o  relevant  (*T;  <t>'  \-  t2-o  :  t  -  o).  The  desired  follows  trivially,  using 
the  properties  of  join. 

Case  t  =Xi  I  o'  \>  Trivial,  using  part  4(b). 

Case  (rest)  >  Using  similar  techniques  as  above. 

Part  4(b)  By  induction  and  use  of  part  4(a). 

Lemma  D.16  (Relevancy  soundness)  1.  If^  \-  T  \  K  and  relevanti)P  \-  T  \K)  =  ^  then  h  T  \K. 

2.  Ifm  h  O  wf  and  relevantifP  h  O  wf)  =  then  h  O  w/ 

3.  //'T;  ^'rf.t'  and  relevant{'¥-  O  h  r  :  T)  =  $  r/ien  O  h  r  :  t'. 

4.  //'F;  O  h  a  :  O'  anr/  relevant('¥-  O  h  a  :  O')  =  'F  r/ien  O  h  a  :  O'. 

Part  1  By  induction  on  the  derivation  of  *F  h  T  .  K. 
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Case  r  =  [<I>]  f  >  By  part  3  we  have  that  if  relevant  ('B;  h  t  :  t')  =  then  <I>  h  t  :  t'.  From  this  we  also 
get  that  'F;  <I>  h  t' :  5,  and  thus  it  is  trivial  to  eonstruet  a  derivation  of  *F  h  [<I>]  t :  [<I>]  t' . 

Case  T  =  [<I>]  <!>'  >  From  part  2  we  get  that  if  relevant  (*F  h  <I>,  <!>'  wf)  =  *F,  then  *F  h  <I>,  <!>'  wf,  thus  the  desired 
follows  trivially. 

Part  2  By  induetion  on  the  derivation  of  *F  h  <I>  wf. 

Case  =  •  >  Trivially  we  have  that  unspec>j<  h  •  wf. 

Case  <I>  =  <F,  t  >  We  have  that  if  relevant  (*F;  h  t  :  5)  =  then  T';  <I>  h  t :  5  by  part  3,  and  furthermore  using 
the  implieit  requirement  that  is  well-formed,  we  also  get  that  *F  h  wf.  Thus  using  the  appropriate  typing 
rule  we  get  *F  h  (<I>,  t)  wf. 

Case  =  <I>,  X,  \>  By  induetion  we  get  that  if  relevant  (*F  h  <I>  wf)  =  then  *F  h  <I>  wf,  and  thus  also 
*F  o  (*F@/)  h  <F  wf.  Furthermore,  (*F o  (*F@/))./  =  *F./.  Thus  using  the  appropriate  well-formedness  rule  for 
<I>  we  get  that  'F  h  (<I>,  X,)  wf. 

Part  3  By  induetion  on  the  derivation  of  *F;  <I>  h  t  :  t'. 

Case  t  =  c  >  Trivially  we  have  that  *F;  <I>  h  c  :  t  for  any  *F,  <I>  sueh  that  *F  h  <I>  wf,  whieh  holds  for  the 

eorresponding  *F  based  on  part  2. 

Case  t  =  s  \>  Similarly  as  above. 

Case  t  =  fi  >  Again,  as  above. 

Case  t  =  n(ti).t2  >  Simple  by  induetion  hypothesis  for  [t2l ,  and  also  from  the  faet  that  relevant  (*F;  <I>  h  ti  :  5)  C 
relevant  (*F  h  (<I>,  ti)  wf)  □  relevant  (*F;  <I>,  ti  h  |'t2l  -s')- 

Case  t  =  ?i(ti).t2  >  By  induetion  hypothesis  for  |'t2l,  if  relevant  (*F;  <I>,  ti  h  |'t2l  -  s')  =  we  get  that 
*F;  <I>,  ti  h  [t2l  :  s'.  Thus  we  also  have  that  *F;  <I>  h  ti  :  5,  and  also  that  either  t'  =  Type'  (whieh  is  an  impossible 

ease),  or  *F;  <I>,  ti  h  t'  :  s".  By  inversion  of  typing  for  *F;  <I>  h  n(ti).  \  t'\  :  s'  we  get  that  in  faet  s"  =  s' ,  and  thus 

it  is  easy  to  derive  *F;  <I>,  ti  h  t'  :  s'  and  *F;  <I>  h  n(ti).  \  t'\  :  s' .  From  these  we  get  the  desired  derivation  for 
$;<I>hX(ti).t2:n(ti).[t'J. 

Case  t  =  ti  t2  >  Trivial  by  induetion  hypothesis  for  ti  and  t2. 

Case  t  =  {t\=  12)  \>  Again,  trivial  by  induetion  hypothesis  for  t\  and  t2,  and  also  from  the  faet  that  *Fi ;  <I>  h  ti  :  t 
implies  *Fi ;  <I>  h  t  :  Type. 

Case  t  =Xil <5  \>  From  the  first  part  (relevaney  of  T  under  the  prefix  eonfexf),  we  gef  fhaf  h  *F  wf.  Furfhermore, 
using  parf  4  we  gef  fhaf  *F;  h  a  :  <!>'.  Lasf,  if  is  frivial  fo  derive  *F;  <I>  h  A,/a  :  t'  •  a  using  fhe  same  fyping  rule, 
sinee  *F./  =  *F./. 

Part  4  By  induefion  on  fhe  derivation  of  *F;  h  a  :  <!>'. 
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Case  a  =  •  >  Trivial. 


Case  o  =  o',  t  >  Trivial  by  induction  hypothesis  and  use  of  part  3. 

Case  o  =  o',  id(X,)  >  By  induction  hypothesis  get 'T;  <I>  h  a  :  <!>'.  Furthermore,  from  'T  h  <I>  wf  and  the  fact 
that  <!>',  Xi  C  <I>,  we  have  by  repeated  typing  inversions  that  IT  'F.  Thus  'F./  =  'F./,  and  we  can  construct  a 
derivation  for  *F;  <I>  h  (a,  id(X,))  :  (<!>',  Xi). 


Definition  D,17  Applying  an  extension  substitution  to  a  partial  context  is  defined  as  follows,  assuming  that  the 
partial  context  does  not  contain  extension  variables  bigger  than  Xixpi 


'Fa>i. 


($,  K)-o^, 
(‘F,  ?)-a^ 


*F  •  avp,  K-  (avji,  Xnj(| ,  •  •  • ,  ) 

‘F  avp,  ? 


Lemma  D.18  (Relevancy  and  extension  substitution)  1.  If  unspecv^,^u  E  relevantifV,  *F„  h  T  -.K),  *F'  h 
avp  :  'F,  and  =  avp,  Xnj(/|,  •  •  • ,  then  t/nspec>j(/ ,  *F„  •  a>i<  □  relevantifi' ,  *F„  •  a>j/  h  T  •  a(j, :  K-o'x^). 

2.  If  unspec^i,'¥uQ  relevant  {'¥,  '¥u\-  ^wf),'¥'  \-  o^)  ando'^,  =  o^),  X\x^i\,  ■■■  ,  then  unspec^ii,'¥u- 

avp  □  relevant  ifV' ,  *F„  •  h  <I>  •  o'^wf). 

3.  If  unspecKy,^u  Q  relevantly,  *F„;  :t'),  *F'  h  a>j/  :  *F,  and  a(j,  =  Xnj</|,  •••  ,  Xnj</|_|_nj<^|,  then 

unspecx^,,^u  •  T  relevantly' ,  *F„  •  a>i<;  <I>  •  a(j,  h  t  •  :  T  •  ajj,). 

4.  If  unspecxjt,^u  E  relevanty,  *F„;  <I>l-a :  <!>'),  *F'  h  avp  :  *F,  and  a(j,  =  X|vj;/|,  •••  ,  then 

unspecx^,,'¥u  •  avp  □  relevanty',  *F„  •  <I>  •  a(j,  h  a  •  a(j, :  •  a(j,). 

Part  1  By  induction  on  the  typing  derivation  of  T,  and  use  of  parts  2  and  3. 

Part  2  By  induction  on  the  well-formedness  derivation  of  <F. 

Case  <F  =  •  >  Trivial. 

Case  <F  =  <!>',  t  [>  Using  part  3  we  get  the  desired  result. 

Case  <I>  =  <!>',  X,  > 

We  have  that  unspecij*,  *F„  C  relevant  (*F,  *F„  h  wf)  o  (('F,  *Fj,)@/). 

We  split  cases  based  on  whether  i  <  |*F|  or  not. 

In  the  first  case: 

We  trivially  have  unspecvp,  *F„  C  relevant  (*F,  *F„  h  wf),  thus  directly  by  use  of  the  induction 
hypothesis  and  the  same  rule  for  relevancy  we  get  the  desired. 
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In  the  second  case: 

Assume  without  loss  of  generality  such  that  unspec>j(,  C  relevant  (*F,  h  <!>' wf),  and 
(unspecvp,  $„)  =  (unspec^, 

Then  by  induction  hypothesis  get  that  unspec»j</,  C  relevant  (*T',  h  <!>'  •  a(j,  wf). 

Now  we  have  that  (<!>',  A,)  •  a(j,  =  <!>'  •  a(j,,  . 

Thus  relevant  ('T',  -avp  h  (<!>',  A)  •a(j,  wf)  =  relevant  (*T',  -  a*!*  h  (<!>'  A_ni<|+ni//|)  wf)  = 

relevant  ('T',  •  avp  h  <!>'  •  o'^  wf)  o  ((»T',  •  o^v)@i  -  |'T|  +  |'T'|). 

Thus  we  have  that  (unspec>j</,  ■  a>i<)  o  {{'¥',  •  a>i<)@/— |*T|  +  |*T'|)  □ 

relevant  (*T',  *T„  •  avp  h  (<!>',  A,)  •  a(j,  wf). 

But  (unspecvj//,  -  a*!*)  o  (('T',  *T„  -avp)®/  —  |»B|  +  |*T'|)  =  unspec>j</,  -avp. 

This  is  because  (unspec>j<,  *T„)  =  (unspecip,  *T'„)  o  ((*T,  *T„)@/),  so  the  /-th  element  is  the  only  one 
where  unspec>j<,*T'„  might  differ  from  unspec>j(,*Tj,;  this  will  be  the  i—  |*T|  +  |*T'|-th  element  after 
a(j,  is  applied;  and  that  element  is  definitely  equal  after  the  join. 

Part  3  By  induction  on  the  typing  derivation  for  t. 

Case  t  =  c,s,  or  vi  >  Trivial  using  part  2. 

Case  t  =  n(ti).t2  >  By  induction  hypothesis  for  [t2l  ■ 

Case  t  =  X{t\).t2  t>  By  induction  hypothesis  for  [t2l  ■ 

Case  t  =  t\t2  \>  Assume  and  *1^2  such  that  o  *T2.  Then  use  induction  hypothesis  for  ti  and  t2-  Last 

combine  the  results  using  join  to  get  the  desired,  noticing  that  both  and  *1^2  •  are  C  'B  •  a>j<  (so  join  is 

defined  befween  fhem),  and  also  fhaf  (*Bi  o  *^2)  •  a>i<  =  *Bi  •  a>i<  o  *B2  o 

Case  t  =Xil(5  \> 

We  splif  cases  based  on  whefher  i  <  |*B|  or  nof.  In  case  if  is,  fhe  proof  is  frivial  using  par!  4.  We  fhus  focus  on 
fhe  case  where  i  >  |*B|. 

We  have  fhaf  unspec>j<,  *B  □  relevant  ((*B,*Bj,)  [i\-  [O']  t :  [O']  5)  o  relevant  (*B,  O  h  a  :  O')  o  (('B,  *B„)@/). 

‘t'|+|'t'„|— !  times 

Assume  such  that  (T'i  =  %  ),  unspec>i„  %  C  relevant (('B,'B„) [,h  [O'] t :  [O'] s), 

unspec*!,,  $2  C  relevant  (^B,  »B„;  O  h  a  :  O')  and  last  that  ^B  =  rB],  o»b2o  ((»B,  'B„@/)). 

By  induction  hypothesis  for  [O']  t  we  get  that: 

unspecxj.,,  rB^^-avi/ C  relevant ((^^',^^^,-011/) [, I-  [O' •  ay  t  •  a[j, :  [O' -ay  ^-alj,). 

By  induction  hypothesis  for  a  we  get  that: 

unspec>j</,  *By  avp  L  relevant  (*B',  rB^,  •  a>i<;  O  •  a(j,  h  a  •  aij, :  O'  •  a(j,) 

We  combine  the  above  to  get  the  desired,  using  the  properties  of  join  at  @  as  we  did  earlier. 

Case  (rest)  >  Similar  to  the  above  cases. 

Part  4  Similar  as  above. 

D.4  Unification 

Here,  we  are  matching  a  term  with  some  unification  variables  against  a  closed  term.  Therefore  we  will  use 
typing  judgements  like  ^  \-pT  :  K  instead  of  T^',  *B„  hp  T  :  K,  as  we  did  above.  The  single  that  we  use 
actually  corresponds  to  *B„;  the  normal  context  *B'  is  empty. 
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First,  we  need  to  define  the  notion  of  partial  substitutions,  eorresponding  to  substitutions  for  partial  eontexts 
as  defined  above. 

Definition  D.19  (Partial  substitutions)  The  syntax  for  partial  substitutions  follows. 

O’qt  ::=  •  I  CJvi/,  T  \  CJvj/,  ? 

Definition  D.20  Joining  two  partial  substitutions  is  defined  below, 
o 


(o^-,  r)o(^',  T) 
?)o(^',  T) 
T)o{af>\  ?) 
(O^.,  ?)o(^',  ?) 


(CT'i<  o  ),  T 
(cJvi>  o  {j>j( ),  T 
(cjvji  o  cj>j( ),  T 
(CTvi>  o  {J>j(  ),  ? 


Definition  D.21  Comparing  two  partial  substitutions  is  defined  below. 


T)  □  T) 
?)  □  (avp',  T) 
(a$,  ?)  □  ?) 


CJv[<  C  CJv[< 
CJv[<  C  CTv[< 
CJv[<  C  CJvji 


Definition  D.22  The  fully  unspecified  substitution  for  a  specific  partial  context  is  defined  as: 


unspec,|,  =  ^ 


unspec,  =  ! 

unspecij,  ,  =  unspecij,,  ? 

unspecij,  ^  =  unspecij,,  ? 


Definition  D.23  Applying  a  partial  extension  substitution  to  a  term,  a  context,  or  a  substitution  is  entirely 
identical  to  normal  substitution.  It  fails  when  a  metavariable  that  is  left  unspecified  in  the  extension  substitution 
gets  used,  something  that  already  happens  from  the  existing  definition  B.  77. 

Definition  D.24  Replacing  an  unspecified  element  of  a  partial  substitution  with  another  works  as  follows. 

CTv[<[/  h- >■  F]  =  CTvi/ 

(^,  ?)[/i-^r]  =  r  when  /  =  |a^jr| 

(^,  r')[/ 1-^  r]  =  a^p[/ 1-^  r],  r' when /<  1^1 
(a>P,  ?)[/ 1-^  r]  =  alp[/ 1-^  r],  ?  when  /  <  |a^| 
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Definition  D.25  Limiting  a  partial  substitution  to  a  specific  partial  context  works  as  follows;  we  assume 

1^1  =  I'J'I- 


=  • 

(^,  T)  2 

=  CJ>1<  ,J,,  ? 

=  CJ't'  T 

?)!$,? 

=  ? 

Definition  D.26  Typing  for  partial  substitutions  is  defined  below. 


•  h  CTvi/ 1  'P  •L  T  K  •  CJv[<  •  \-p  cTxfi :  'P 

.  hp  (a$,  T)  :  ($,  K)  .  (a^ ,  ?)  :  ($,  ?) 

Lemma  D.27  If*  h  and  •  h  ^2  •  *1^2.  with'¥i  0*^2  tind  0^2  defined,  then  •  h  aiJi  0^2  •  *Li  0*^2- 

By  induction  on  the  derivation  of  o  al^2  =  ^  ■ 

Case  •  o  •  >  Trivial,  since  *Ti  =  *T2  =  •  by  typing  inversion. 

Case  (a>r<i,  r)  o  (a>i<2,  T)  >  By  typing  inversion  get  K  with  T  :  K,  and  *^2  =  *^2,  with  T  :  K. 

Thus  *Ti  o  *^2  =  o  *1^2)  by  induction  hypothesis  for  ,  ^'2  and  typing  it  is  easy  to  prove  the  desired. 


?)o(a>j(2,  r)  ^  ^  ^  ^  ^  ^ 

Case -  >  y  typing  inversion  get  'Ll  =  'Bj ,  ?,  and  *^2  =  '^2,  K  with  T  :  K.  Thus  'Ll  o  *^2  = 

B 

*Bj  0*^2)  K,  and  by  induction  hypothesis  for  aii<i,aii<2  and  typing  it  is  easy  to  prove  the  desired. 


Case 


(aVi,  T)o{a^>2,  ?) 


>  imilar  to  the  above. 


I)o(p^l2,  ?)  ^  ^  ^  ^ 

Case -  >  gain  by  induction  hypothesis  and  the  fact  that  'Ll  =  ,  ?  and  *^2  =  *^2,  ?  by  typing 


inversion. 


Lemma  D.28  If*  h  :  *B,  •  h  *B'  wfand  *B'  C  'T,  then  |ij„  [I  a>i<  and  *  h  |ij„  :  *B'. 

Trivial  by  induction  on  the  derivation  of 

Now  we  are  ready  to  proceed  to  a  proof  about  the  fact  that  either  a  unique  unification  partial  substitution 
exists  for  patterns  and  terms  that  are  typed  under  the  restrictive  typing,  or  that  no  such  substitution  exists.  The 
constructive  content  of  this  proof  will  be  our  unification  procedure. 
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To  prove  the  following  theorem  we  assume  that  if  <I>  hp  t :  t',  with  t'  /  Type',  the  derivation  ^\-pt'  \s 
for  a  suitable  5  is  a  sub-derivation  of  the  derivation  <I>  hp  t  :  t' .  The  way  we  have  written  our  rules  this 
is  aetually  not  true,  but  an  adaptation  where  the  t'  :  s  derivation  beeomes  part  of  the  t  :  t'  derivation  is 
possible,  thanks  to  the  theorem  B.68. 

Theorem  D.29  (Decidability  and  determinism  of  unification)  1.  If^  hp  <I>  w/  •  hp  <f>'  wf,  relevant  hp  <I>  wf) 

then  there  either  exists  a  unique  substitution  such  that  •  h  a>j< :  and  =  <!>',  or  no  such  substi¬ 

tution  exists. 

2.  If^‘,  <I>  hp  t :  tr,  •;  <!>'  hp  d  :  tj  and  relevant h  t :  =  *T',  then: 

assuming  that  <f>  hp  :  5,  •;  <I>  hp  ^  :  5,  relevant  ^  \- p  tr  '■  s)  =  '^  (or,  if  tr  =  Type',  that  hp  <I>  w/ 

•  hp  <I>  w/  relevant  hp  ’t>wf)  =  *T)  and  there  exists  a  unique  substitution  ^  such  that  •  h 
•  CTvj/  =  <!>'  and  tj  ■  CJ>j(  =  t'j, 

then  there  either  exists  a  unique  substitution  ^  such  that  •  h  ofi  :  *T',  <I>  •  aij  =  <!>',  tj  ■  G?'  =  t'j  and 
t  ■  aij  =  t',  or  no  such  substitution  exists. 

3.  dpT  :  •  hp  r'  :  ^  and  relevantifV',  <I>  h  T  :  ^)  =  then  either  there  exists  a  unique  substitution  a>j< 

such  that  •  h  and  T  •  =  T',  or  no  such  substitution  exists. 

Part  2  By  induction  on  the  typing  derivation  for  t. 


Case 


c:t  el. 
'P-,<^hpc:tT 


O 


We  have  t  •  =  c  •  =  c.  So  for  any  substitution  to  satisfy  the  desired  properties  we  need  to  have  that  t'  =  c 

also;  if  this  isn’t  so,  no  o^'  possibly  exists.  If  we  have  that  t  =  t'  =  c,  then  the  desired  is  proved  directly  by 
assumption,  considering  that  relevant  (*T;  <I>  hp  c  :  t)  =  relevant  (*T;  ^dptj  :s)  =  relevant  (*T  hp  <f>  wf)  (since 
tr  comes  from  the  definitions  context  and  can  therefore  not  contain  extension  variables). 


<f>.I  =  t 

Case -  > 


'l';<I>hp/i:fT 

Similarly  as  above.  First,  we  need  t'  =  fi,  otherwise  no  suitable  aij  exists.  From  assumption  we  have  a  unique 
^  for  relevant  (*F;  hp  tr  :  s).  If  I  •  aij  =  I',  then  a>jr  has  all  the  desired  properties  for  considering  the 
fact  that  relevant  (*F;  <I>  hp  /i :  tj)  =  relevant  (*F  hp  <I>  wf)  and  relevant  (*F;  <I>  hp  tj- :  5)  =  relevant  (*F  hp  <I>  wf) 
(since  tj  =  ‘J’-O-  It  is  also  unique,  because  an  alternate  would  violate  the  assumed  uniqueness  of  a>j<.  If 
I  •  ^  ^  ,  no  suitable  substitution  exists,  because  of  the  same  reason. 


is,  s')  e  A 

Case  7  > 

'F;  <I>  hp  5  :  / 

Entirely  similar  to  the  case  for  c. 


Case 


'F;<I>hpti:5  'F;  <!>,  tihp  rt2l|<i,|  (5,/,/')g3? 


[> 


'F;  <I>hpn(ti).t2:/' 

First,  we  have  either  that  t'  =  Il(t']).t'2,  or  no  suitable  exists.  Thus  by  inversion  for  t'  we  get: 


•  ;  <F'  hp  f  :  5*,  •;  <!>',  f  hp  :  <,  (5*,<,>y")  h  iF. 

Now,  we  need  5  =  5*,  otherwise  no  suitable  possibly  exists.  To  see  why  this  is  so,  assume  that  a  satis¬ 
fying  the  necessary  conditions  exists,  and  5/5*;  then  we  have  that  ti  •  =  tj,  which  means  that  their  types 

should  also  match,  a  contradiction. 

We  use  the  induction  hypothesis  for  t\  and  t).  We  are  allowed  to  do  so  because  relevant  ('F;  hp  s"  :  s'")  = 
relevant  (*F;  <I>  hp  5  :  s""),  and  the  other  properties  for  Oip  also  hold  trivially. 
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From  that  we  either  get  a  sueh  that:  •  h  :  *F',  where  *F'  =  relevant  ('F;  <I>  hp  ti  :  s)  and  ti  ■  =  t[, 

<F  •  =  <!>'.  Sinee  a  partial  substitution  unifying  t  with  t'  will  also  inelude  a  substitution  that  only  has  to 

do  with  'F',  we  see  that  if  no  a^'  is  returned  by  the  induetion  hypothesis,  no  suitable  substitution  for  t  and  F 
aetually  exists. 

We  ean  now  use  the  induetion  hypothesis  for  t2  and  sinee  relevant  (*F;  <I>  ti  \-p  F  :  s’””)  =  relevant  (*F;  ^\-pt\  :  s), 
and  the  other  requirements  trivially  hold.  Espeeially  for  s'  and  s'^  being  equal,  this  is  trivial  sinee  both  need  to 
be  equal  to  s”  (beeause  of  the  form  of  our  rule  set  iF). 

From  that  we  either  get  a  ^  sueh  that,  •  h  ^  :  *F",  [f2l  loi  •  ^  ^  ~  ^  ~ 

that  sueh  does  not  exist.  In  the  seeond  ease  we  proeeed  as  above,  so  we  foeus  in  the  first  ease. 

By  use  of  properties  of  freshening  (like  injeetivity)  we  are  led  to  the  faet  that  (n(ti).t2)  •  =  n(tj).(t2), 

so  the  returned  has  the  desired  properties,  if  we  eonsider  the  faet  that  relevant  (*F;  hp  n(ti).t2  :  s”)  = 

relevant  (^'F;  <I>,ti  hp  [t2l|ci>|  :  ‘5')- 


Case 


'F;<I>hpti:5  'F;*!),  tihprt2l|<j>|:f3  'F;  <F  hp  n(ti).  [fsj 


<J>|. 


:  s 


> 


'F;<I>hp  X(ti).t2:n(ti).Lf3j|o|,. 

We  have  that  either  t'  =  X{t[).t'2,  or  no  suitable  exists.  Thus  by  typing  inversion  for  t'  we  get: 
•;  hp  t[  :  5*,  •;  <F',  t[  hp  :  l3>  •;  *5*'  n(tO-  Lf3j|<i>'|,.  : 

By  assumption  we  have  that  there  exists  a  unique  ^  sueh  that  relevant  f*F;  <I>  n(ti).  [f3j 


=  'F, 


•  h  :  *F,  <I>  •  =  <!>',  (n(ti).  [taj)  •  =  n(tj).  [tjJ,  if  relevant  ^*F;  <I>  hp  n(ti).  [taj  |^  .|  j  =  *F.  From  that 

we  also  get  that  s'  =  5'^ . 

From  the  faet  that  (n(ti).  [t3j )  •  =  n(tj).  we  get  first  of  all  that  t\  •  a>j<  =  t[,  and  also  that  t3  •  =  t'^. 

Furthermore,  We  have  that  relevant  (*F;  <I>  hp  n(ti).  [f3j  :  s')  =  relevant  (*F;  <F,  ti  hp  ti,  :  s'). 

From  that  we  understand  that  ^  is  a  suitable  substitution  to  use  for  the  induetion  hypothesis  for  [t2l  ■ 

Thus  from  induetion  hypothesis  we  either  get  a  unique  a^'  with  the  properties:  •  h  :  *F',  [t2l  •  ’ 

(<I>,  ti)  •  a>i<  =  <F',  t(,  t3  •  aij  =  tj,  if  relevant  ^*F;  <I>,  ti  hp  [t2l  |ci)|  •  ^3^  =  *1^5  or  that  no  sueh  substitution  exists. 
We  foeus  on  the  first  ease;  in  the  seeond  ease  no  unifying  substitution  for  t  and  t'  exists,  otherwise  the  laek  of 
existenee  of  a  suitable  would  lead  to  a  eontradietion. 

This  substitution  has  the  desired  properties  with  respeet  to  unifieation  of  t  against  t'  (again  using  the 
properties  of  freshening,  like  injeetivity),  and  it  is  unique,  beeause  the  existenee  of  an  alternate  substitution 
with  the  same  properties  would  violate  the  uniqueness  assumption  of  the  substitution  returned  by  induetion 
hypothesis. 


'F;  <I>hpti  :n(ta).tfe  'F;<Fhpf2:fa 
*F;  <I> hp  h  t2  :  \tb\ |(i>|  •  (idci),t2) 

Again  we  have  that  either  t'  =  tj  t'2,  or  no  suitable  substitution  possibly  exists.  Thus  by  inversion  of  typing  for  t' 
we  get: 

•  ;  <I>  hp  tj  :  n(t').t^,  •;  <I>  hp  :  t'^,  t'j  =  |^,|  •  (Ido',  t'2). 

Furthermore  we  have  that  *F;  <I>  hp  Yl{ta).tb  '■  s  and  •;  hp  n(t').t^  :  s'  for  suitable  s,  s'.  We  need  s  =  s',  other¬ 
wise  no  suitable  a^'  exists  (beeause  if  ti  and  were  unifiable  by  substitution,  their  IT-types  would  mateh,  and 
also  their  sorts,  whieh  is  a  eontradietion). 

We  ean  use  the  induetion  hypothesis  for  Il{ta).tb  and  n(t').t^,  with  the  partial  substitution  a>j<  limited  only  to 
those  variables  relevant  in  *F  hp  <I>  wf.  In  that  ease  all  of  the  requirements  for  a>i<  hold  (the  uniqueness  eondition 
also  holds  for  this  substitution,  using  part  1  for  the  faet  that  <I>  and  only  have  a  unique  unifieation  substi¬ 
tution),  so  we  get  from  the  induetion  hypothesis  either  a  for  *F'  =  relevant  (*F;  <I>  •  ^)  sueh  that 

=  <!>'  and  {Il{ta).tb)  •  =  n(t').t^,  or  that  no  sueh  a^'  exists.  In  the  seeond  ease,  again  we  ean  show 


145 


that  no  suitable  substitution  for  t  and  t'  exists;  so  we  foeus  in  the  first  ease. 

We  ean  now  use  the  induetion  hypothesis  for  t\,  using  this  .  From  that,  we  get  that  either  a  exists  for 
=  relevant  (*F;  ^\-pt\  :  Il{ta).tb)  sueh  that  ti  •  =  t[  ete.,  or  that  no  sueh  a>j<i  exists,  in  whieh  ease  we 

argue  that  no  global  exists  for  unifying  t  and  t'  (beeause  we  eould  limit  it  to  the  variables  and  yield  a 
eontradietion). 

We  now  form  whieh  is  the  limitation  of  to  the  eontext  =  relevant  (*F;  <I>  hp  :  5* ) .  For  that,  we 
have  that  •  \-p  <I>  •  =  <!>'  and  ta  •  Also  it  is  the  unique  substitution  with  those  properties, 

otherwise  the  induetion  hypothesis  for  ta  would  be  violated. 

Using  we  ean  allude  to  the  induetion  hypothesis  for  t2,  whieh  either  yields  a  substitution  0^2  for 
*^2  =  relevant  (*F;  <I>  hp  t2  :  to),  sueh  that  t2  •  ^2  =  ^2’  or  that  no  sueh  substitution  exists,  whieh  we  prove 
implies  no  global  unifying  substitution  exists. 

Having  now  the  and  a>j<2  speeified  above,  we  eonsider  the  substitution  0^2-  This  substi¬ 

tution,  if  it  exists,  has  the  desired  properties:  we  have  that  =  relevant  (*F;  <I>  hp  t\  t2  :  \tb  \  •  (Ido,  12))  = 
relevant  (*F;  <I>  hp  ti  :  n(u).u)  o  relevant  (*F;  <I>  hp  t2  :  U),  and  thus  •  h  0$^  :  Also,  {t\  12)  ■  =  t[  4, 

It  •  =  t'r  (beeause  U  •  =  4  oto.),  and  <I>  •  =  <F'.  It  is  also  unique:  if  another  substitution  had  the 

same  properties,  we  eould  limit  it  to  either  the  relevant  variables  for  ti  or  t2  and  get  a  eontradietion.  Thus  this  is 
the  desired  substitution. 

If  does  not  exist,  then  no  suitable  substitution  for  unifying  t  and  t'  exists.  This  is  again  beeause  we  eould 
limit  any  potential  sueh  substitution  to  two  parts,  and  0^2  (for  and  *^2  respeetively),  violating  the 
uniqueness  of  the  substitutions  yielded  by  the  induetion  hypothesis. 

'F,<Fhpti.U  'F,<Fhpt2-U  'F,<FhpU.  Typo 

Case - - - - - - -  > 

<I>  hp  ti  =  t2  :  Prop 

Similarly  as  above.  First  assume  that  t'  =  {t[  =  with  t[  :  t',  :  t'  and  t'  :  Type.  Then,  by  induetion  hypothesis 

get  a  unifying  substitution  for  U  and  t' .  Use  that  in  order  to  allude  to  the  induetion  hypothesis  for  ti  and 
t2  independently,  yielding  substitutions  and  0^2-  Last,  elaim  that  the  globally  required  substitution  must 
aetually  be  equal  to  o  aiJ/2- 

('F)./  =  r  r  =  [<!>*] fr  TT;  <I)  hp  a  :  <I>*  <I>*  C  <F  a  =  ld<j>^ 

Case - ^ -  l> 

'F;  <FhpA;/a:t7--ct 

We  trivially  have  tj  -  o  =  tT-  We  split  eases  depending  on  whether  a^.i  =?  or  not.  If  it  is  unspeeified: 

We  split  eases  further  depending  on  whether  F  uses  any  variables  higher  than  |<I>*  •  a>i<|  —  1  or  not. 

That  is,  if  t'  <f  •  aiJI  or  not.  In  the  ease  where  this  doesn’t  hold,  it  is  obvious  that  there  is  no 
possible  sueh  that  (A, /a)  •  =  t' ,  sinee  must  inelude  Ovji,  and  the  term  (A,/a)  •  ean 

therefore  not  inelude  variables  outside  the  prefix  ^  of  <I>  •  a^. 

In  the  ease  where  t'  <f  |<I>*  •  a>jr|,  we  eonsider  the  substitution  1— )■  t'].  In  that 

ease  we  obviously  have  <I>  •  =  <!>',  tj  •  =  tj,  and  also  t  •  aij  =  t' .  Also,  *F'  = 

relevant  (*F;  <I>  hp  A/a  :tT-o)  =  (relevant  (*F1.,;  <I>*  hp  tr  :  5) ,?,  •  •  •  ,  ?)o  relevant  (*F;  <I>  hp  a  :  <I>*)o 
('F@/). 

We  need  to  show  that  •  h  6^'  :  *F'.  First,  we  have  that  relevant  (*F;  <I>  hp  a :  <!>*)  = 

relevant  ('F  hp  wf)  sinee  <I>*  C  <I>.  Seeond,  we  have  that  relevant  (*F;  <I>  hp  :  5)  = 

(relevant  (*F1.,;  hp  tj-  :  5)  ,?,•••,?)  o  relevant  (*F  hp  wf).  Thus  we  have  that  'F'  =  *Fo  (*F@/).  It 
is  now  trivial  to  see  that  indeed  •  h  :  *F'. 

If  o^.i  =  4,  then  we  split  eases  on  whether  4  =  t'  or  not.  If  it  is,  then  obviously  aij  is  the  desired  unifying 
substitution  for  whieh  all  the  desired  properties  hold.  If  it  is  not,  then  no  substitution  with  the  desired  properties 
possibly  exists,  beeause  it  would  violate  the  uniqueness  assumption  for  a$. 
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Case  (rest)  >  Similar  techniques  as  above. 

Part  1  By  induction  on  the  well-formedness  derivation  for  <I>. 


Case -  > 


hp  •  wf 

Trivially,  we  either  have  <!>'  =  •,  in  which  case  unspecij*  is  the  unique  substitution  with  the  desired  properties, 
or  no  substitution  possibly  exists. 


'Th„<I>wf 

Case -  > 

'T  hp  (O,  t)  wf  ^ 

We  either  have  that  <!>'  =  <!>',  t'  or  no  substitution  possibly  exists.  By  induction  hypothesis  get  a>i<  such  that  <I>  • 
OvJ/  =  <!>'  and  •  h  a>j< :  with 'T  =  relevant  (*T  \-p  <I>  wf) .  Now  we  use  part  2  to  either  get  a  a«j/^  which  is  obviously 
the  substitution  that  we  want,  since  (<I>,  t)  =  <!>',  t'  and  relevant  (*T  \-p  (<I>,  t)  wf)  =  relevant  (*T;  <I>  t :  5); 
or  we  get  the  fact  that  no  such  substitution  possibly  exists.  In  that  case,  we  again  conclude  that  no  substitution 
for  the  current  case  exists  either,  otherwise  it  would  violate  the  induction  hypothesis. 


•  h„<I>wf  ('T)./=  [Ojctx 
Case - 


'T  hp  <I>,  Xi  wf 

We  either  have  <!>'  =  <I>,  <I>",  or  no  substitution  possibly  exists  (since  <I>  does  not  depend  on  unification  variables, 
so  we  always  have  <I>  •  =  <I>).  We  now  consider  the  substitution  =  unspec>j<[/ 1— )■  We  obviously 

have  that  (<I>,  Xi)  ■  aij  =  <I>,  <I>",  and  also  that  •  h  a>j< :  with  =  '¥@i  =  relevant  (*T  \-p  <I>,  Xt  wf).  Thus  this 
substitution  has  the  desired  properties. 


Part  3  By  induction  on  the  typing  for  T. 

'P;<Phpt:tT  'P;<Phtr:s 
Case  ^ ^ ^ —  > 

^hp  [^]t:[^]tT 

By  inversion  of  typing  for  T'  we  have:  T'  =  [<I>]  t' ,  •;  <I>  hp  T  :  tj,  •;  <I>  hp  :  s. 

We  obviously  have  =  relevant  (*T;  ^\-ptT  \  s)  =  unspecvp,  and  the  substitution  ^  =  unspec>j<  is  the  unique 
substitution  such  that  •  h  av  :  <I>  •  aip  =  <I>  and  tj  -  ^  =  tT-  We  can  thus  use  part  2  for  attempting  unification 

between  t  and  t' ,  yielding  a  such  that  •  h  with  =  relevant  (*T;  <I>  hp  t :  tj)  and  t  •  =  t' .  We 

have  that  relevant  ('T;  <I>  hp  t  :  tj-)  =  relevant  (*TI-p  T  :  K),  thus  by  assumption.  From  that  we  realize 

that  is  a  fully-specified  subsfifufion  since  •  h  :  'T,  and  fhus  fhis  is  fhe  subsfifufion  wifh  fhe  desired 
properfies. 

If  unificalion  befween  t  and  t'  fails,  if  is  frivial  fo  see  fhaf  no  subsfifulion  wifh  fhe  desired  subsfifulion  exisfs, 
ofherwise  if  would  lead  direcfly  fo  a  confradicfion. 

'T  h„  <I>,  <!>'  wf 

Case - ^  ^  , —  > 

»Php  [<I>]<I>':  [<F]ctx 

By  inversion  of  fyping  for  T'  we  have:  T'  =  •  hp  <I>,  <I>"  wf,  •  hp  <I>  wf.  From  pari  1  we  gel  a  unifying 

<I>,  <!>'  and  <F,  <F",  or  fhe  facl  lhal  no  such  aij  exisfs.  In  fhe  firsl  case,  as  above,  if  is  easy  fo  see  lhal  Ibis  is  fhe 
fully-specified  subslilulion  lhal  we  desire.  In  fhe  second  case,  no  suilable  subsfifufion  exisfs,  ofherwise  we  are 
led  direcfly  fo  a  confradicfion. 


The  above  proof  is  conslruclive.  Ils  compulalional  confenl  is  aclually  a  unificalion  algorifhm  for  our  pal- 
ferns.  We  illuslrale  fhe  algorifhm  below  by  giving  ils  unificalion  rules;  nolice  lhal  if  follows  fhe  inductive 
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structure  of  the  proof  (and  makes  the  same  assumption  about  types-of-types  being  subderivations).  If  a 
derivation  according  to  the  following  rules  is  not  possible,  the  algorithm  returns  failure. 


Definition  D.30  (Unification  algorithm)  'We  give  the  rules  for  the  unification  algorithm  below. 


{W'rpT  ■.K)r^  (•  hp  T' : 


(*P;  <1>  hp  t :  tr)  ~  (•;  <1>  hp  t' :  tj)  <iunspec>j(i>a^i< 

('P  hp  [O]  t :  [O]  tj)  ~  (.  hp  [<!>]  t' :  [<!>]  4) 


{W  hp  O,  <!>'  wf)  ~  (•  hp  O,  <I>"  wf) 

('F  hp  [<I>]<I>' :  [O]  ctx)  ~  (•  hp  [<I>]<I>"  :  [<I>]  ctx) 


('P  hp  <I>  wf)  ~  (•  h  <!>'  wf) 


(*P  hp  •  wf)  ~  (•  h  •  wf)  i>  unspec>j( 

('P  hp  <I>  wf)  ~  (•  hp  O'  wf)  ('F;  O  hp  t :  5)  ~  (•;  O'  h  t' :  s)  <ia^ 

('P  hp  (O,  t)  wf)  ~  (•  h  (O',  t')  wf) 

(*P  hp  O,  Xi  wf)  ~  (•  h  O,  O'  wf)  >  unspec.j<[/ 1->  O'] 


('P ;  Ohpt :  tx)  ~  (•;  O'ht'  ■  tj)<  O'v  >  ct>i< 


(*P;  O  hp  c  :  t)  ~  O'  h  c  :  t')  <i^i>ai]/'  (*F;  O  hp  5  :  5')  ~  (•;  O'  h  5  :  s')  <i^i>al]/' 

I-ai^  =  1' 


('P;Ohp/i:t 

('P;Ohp  tr.s 

(w-,  0,  h  hp  rt2l|o|:/) 

1  ~  0'  h  /i' :  t')  <1  CTv[(  >  cjir 

)  ~  (•;  0'  h  t]  :  5)  <1  {jvi> [>CT>j( 

~  o',  t'l  hp  <iaVi>CTvi/" 

/W-^hptr.s  'F;0,  tihprt2l|<j>|:/^ 

{s,s' ,s")  £  fk 

0  h  t'l  :  5  •;  O',  f]  h  ['t^]  :  s'\ 

{s,s' ,s")  £  fk 

I>  CJvj/ 

'F;  Ohpn(ti).t2:/ 

V  2 

('P;  0,  ti  hp  rt2l|<j>|  :f') 

•;  0'hn(t'i).4  :/ 

V  / 

O',  t]  hp  [4]  :  f")  <ia$ oa^' 

/W-^hph-.s  'F;  0,  ti  hp  [t2l|<i,| 

'F;Ohpn(ti).[t'J|^l 

y*;0'ht]:5  •;  O',  t]  h  [t2l|<i,/|  : 

.;0'hpn(t;).Lt"J|^,| 

'F;Ohp  X(ti).t2:n(ti).Lt'J|^l 

V  '  y 

o'hMt;).4:n(t;).Lt"j|<^,i . 

V  y 
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('F,  0  \~p  Tl[ta)-tb  .  ~  (•,  *1*  \~p  .  S  )  l>0>j< 

(*P;  ^\-pti  :  U{ta).tb)  ~  (•;  \-p  t[  :  n(f').ffo) 

('P;  <I>  Hp  ?2  :  to)  ~  (•;  <!>'  |-p  4  •  4)  0^1relevant(‘I';<I>hp(„:.5)  >^2 


/  'P;  O  hp  fa  :  5 

\ 

'P;  Ohpn(t,).ffc:/ 

^  ^  ^  p  ^2  ■ 

'P;  Ohpti  :n{ta).tb 

*F;  <I>  \-p  ti  t2  :  \tb]  |<j,|  •  (icl<j),f2) 


•  ;  <!>'  hpt'^:s 


i;  4>  hpfi  :n(f^)4 _ 

•  ;  hp  fj  f2  •  h  |(j>/|  ■  (id<j)',f2) 


<l0>j( o 0ij<2 


(*P;  <I>  Hp  ? :  Type)  ~  (•;  <!>'  hp  f' :  Type)  <ia$ oa^' 

('P;  <I>  hp  :  ?)  ~  (•;  l“p  4  •  ocTip  i>0v[<i  ('F;  <I>  hp  f2  •  0  ~  (•>  4  •  ^0 

/*F;  <I>  hp  :  f  *F;  <I>  hp  f2  :  ?\  / •',  •;<I>hp4:f^\ 

*F;  <I>  hp  ? :  Type  •;  <I>  hp  /  :  Type  _  _  _ 

-  ~  - -. - -. -  <10v[<  o  0>j(2 

»F;  <I)  hp  fi  =  f2  :  Prop  •;  O  hp  4  =  4  :  Prop 


S^.i=l  ^>.i=[^*]tT  r' |<I>*-a$| 

(rp;  <I>hpX,/a:fr-a)~  (•;  O' h  f' :  4)  <a$>a^[/ ^  [<!>*]?'] 

^./ =[<!)*]  f  t  =  t' 

(*P;  <I>  hpX,-/a  :  tr)  ~  (•;  O'  h  h  :  4)  oa^ oa^ 

Lemma  D.31  The  above  rules  are  algorithmic. 

Proved  by  the  faet  that  they  obey  struetural  induetion  on  the  typing  derivations,  and  are  deterministie;  non- 
eovered  eases  signify  the  unifieation  failure  result. 

By  mimieking  the  unifieation  proof  above,  we  eould  show  independently  that  the  above  algorithm  is  sound 
-  that  is,  that  the  it  returns  if  it  is  sueeessful  is  aetually  a  substitution  that  makes  t  and  t'  unify  (as  well 
as  O  and  O',  along  with  tj  and  tj)  and  is  of  the  right  type,  provided  that  the  assumptions  about  the  input 
substitution  do  hold.  Furthermore,  we  eould  show  eompleteness,  the  faet  that  if  the  algorithm  fails,  no 
sueh  substitution  aetually  exists. 

D.5  Computational  language 

Here  we  will  refine  our  results  for  progress  and  preservation  from  the  previous  seetion,  using  the  above  results. 

Definition  D.32  ITfe  refine  the  typing  rule  for  pattern  matching  from  definition  C.4  as  shown  below. 

^>^T-.K  'P,  'P  hp  ['P']  ,^1  wf 

'P,  Pp  -K  unspecvp,  ['P']|^|  □  relevant  ('P,  ['P']|^|  hp  [r']  :  .Sf) 

'P,  |'P']|^|;r;rh4']|^l  l^,|:rxl|^l,i-(idvi., 

*P;  Z;  r  h  unify  T  return  (.x)  with  (*P'.r'  e')\  { [x]  nj,|  j  •  (id>i<,  T))  +  unit 

Lemma  D.33  (Substitution)  Adaptation  of  the  substitution  lemma  from  C.13. 
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All  the  cases  are  entirely  identical  to  the  previous  proof,  with  the  exception  of  the  pattern  matching  construct 
which  has  a  new  typing  rule.  In  that  case,  proceed  similarly  as  before,  with  the  use  of  the  lemmas  D.2,  D.3  and 
D.18  proved  above. 

Theorem  D.34  (Preservation)  Adaptation  of  theorem  C.16  to  the  new  rules. 

All  the  cases  are  entirely  identical  to  the  previous  proof,  with  the  exception  of  the  pattern  matching  construct. 
In  that  case,  we  need  to  explicitly  allude  to  the  fact  that  if  •  \-p  nj,|  wf,  then  obviously  also  •  h  |,j,|  wf. 
Similarly  we  have  that  |'t"|  •  ^  implies  [*^^0  ^  |'t"|  • 

Theorem  D.35  (Progress)  Adaptation  of  theorem  E.ll  to  the  new  rules. 

Again  the  only  case  that  needs  adaptation  is  the  pattern  matching  case.  In  that  case,  we  first  note  that  if  •  h  T  .K 
(as  we  have  here),  we  also  have  •\-pT  :K.  Then,  we  allude  to  the  theorem  3  to  split  cases  depending  on  whether 
a  suitable  avp  exists  or  not.  In  both  cases,  one  step  rule  is  applicable  -  if  a  unique  avp  exists,  then  it  has  the  desired 
properties  for  the  first  pattern  matching  step  rule  to  work;  if  it  does  not,  the  second  pattern  matching  step  rule  is 
applicable. 

D.6  Sketch:  practical  pattern  matching 

The  unification  algorithm  presented  above  requires  full  typing  derivations  for  terms,  something  that  is  unrealistic 
to  keep  around  as  part  of  the  runtime  representation  of  terms.  Here  we  will  present  an  informal  refinement  of 
the  above  algorithm,  that  works  on  suitably  annotated  terms,  instead  of  full  typing  derivations.  The  annotations 
are  the  minimal  possible  extra  information  needed  to  simulate  the  above  algorithm. 

Definition  D.36  We  define  a  notion  of  annotated  terms,  for  which  we  reuse  the  t  syntactic  class;  it  will  be 
apparent  from  the  context  whether  we  mean  a  normal  or  an  annotated  term. 

t  ::=c\s\fi\bi\  X{h).t2  \  n,(ti).t2  |  {h  :  f)  f2  |  h  =t  t2  \  Xi/o 

Lemma  D.37  1.  If  t  is  an  unannotated  term  with  •,  ^\-pt  :t'  then  there  exists  a  derivation  for  <I>  \-p 

t  :  t'  where  all  terms  are  annotated  terms. 

2.  The  inverse  is  also  true. 

These  are  trivial  to  prove  by  structural  induction  on  the  typing  derivations. 

Definition  D.38  The  unification  procedure  is  defined  through  the  following  judgement.  It  gets  W  as  a  global 
parameter,  which  we  omit  here. 


(r)  ~  (r') 

(t)  ~  (?')  <iunspecvj/>a^ 

(O,  O')  ~  (O,  O") 

(O')  ~  (O") 

([0]t)  ~  ([0]t')  0^ 

([0]0')  ~  ([0]0")  oaij- 

(<J>)  ~  l>CTvf<  (0  ~  (^0 

(•)  ~  (•)  0  unspec,|,  ((O,  t))  ~  ((O',  t')) 


(O,  Xi)  ~  (0,0')  =  unspec,|,[/ [0]0'] 
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(0  ~  (^0 


(c)  (c)  <10>j(  l>0>j< 


(5)  ~  (s)  <  >  0vi> 


^  =  /  (fl)  {t'l^  <l0>j(  l>0>j<  (  |'f2l  )  ~  (  |~^2l  ) 

(n^(ri).f2)  ~  (ni'(^i)-4)  <icT>i<i>a>i<^^ 


1-0$  =1' 


(/i)  ~  (/i)  <icj>i<c>avi/ 


([^2!)  ~  (|"41)  <i<7>i->aV 

{Kh).t2)  (X(fJ).4)  <]^c>a^p' 


(f)  ~  {t'^  <10VJ<I>0V[<  (fl)  ~  <10>J(  I>0>J<1  {1.2)  ~  I>CTvi'2 


0>I(j  o  0v[<2  =  0>I( 


(f)  ~  {t'^  <10vj;l>0V[< 


{{t\  :  t)  t2)  ~  {{t[  :  t')  4)  <]a>i<c>avi/ 

(^i)  r\j  I>CT>Pl  (^2)  ~  (4)“^^'P  I>f^‘I'2 


0>j(j  o  0v[<2  = 


{t\  =t  12)  ~  (4  =t'  4)  <i<?‘pi>cTvi- ' 


0VJ/./ =?  t'  <i^  |0-0vf<|  (5'ii.i  =  t' 

{Xi /a)  ~  (?')  <]  0  ^  1->  t']  {Xi /a)  ~  [t')  <1 

E.  Simple  staging  support 

Here  we  will  add  a  light-weight  staging  support  to  the  eomputational  language.  We  extend  the  eomputational 
language  as  follows. 

Definition  E.l  The  syntax  of  the  computational  language  is  extended  below. 


e  ::=  ■■■  \  letstaticx  =  e\t\e' 

r::=  •••  I  r,  X  T 

Definition  E.2  Freshening  and  binding  for  computational  types  and  terms  are  extended  as  follows. 


r 

\^\n,k 


[letstaticx  =  eine']iv/r  =  letstaticx  =  in  \e'^^’^^ 


I  \M 

1^\n,k 


[letstaticx  =  eine'Jiv/r  =  letstaticx  =  \e\^ ^  in  \e'\’^^ 


Definition  E.3  Extension  substitution  application  to  computational-level  types  and  terms. 


e  •  {Jifi 


ietstaticx  =  eine'-avj;  =  ietstaticx  =  e  ■  ayv  \n  e' ■  a^i 
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Definition  E.4  Limiting  a  context  to  the  static  types  is  defined  as  follows. 


ri 


static 


® I  static 

(r,  x:,  01  static 

(r,x:0l  static 

(r,  OC  !  I  static 


r| static?  X  ■  t 
r  I  static 
r  I  static 


Definition  E.5  The  typing  judgements  for  the  computational  language  are  extended  below. 


^;Z;rhe:x 


•  ;  I;  ristatic  I- e  :  X  'E;  I;  r,x  x  h  0  :  x  x  x  G  T 

Z;  r  h  letstatic  x  =  e  in  0  :  x  Z;  F  h  x  :  x 

Definition  E.6  Small-step  operational  semantics  for  the  language  are  extended  below. 


e  ::=  A{K).e  \eT  \  pack  T  return  (.x)  with  e  \  unpack  e  {.)x.{e') 

I  0  I  error  \  'kx\x.e\ee'\x\{e,  e')  \  proj,-  e  \  inj,-  e  \  case(e,  x.e' ,  x.e")  \  foid  e  \  unfoid  e\rei  e 
\e\=e'  \  \e\l\  Aa  :  |  e  x  |  fix  x  :  x.e 

I  unify  T  return  (.x)  with  (*F.r'  i-g  e')  \  ietstaticx  =  erne' 

V  ::=  A{K).ed  \  pack  T  return  (.x)  with  v  |  ()  |  Xx  :  x.ed  \  (v,  v')  |  inj,-  v  |  foid  v\l\  Aa  :  k.ed 

ed  ::=  A{K).ed  \edT  \  pack  T  return  (.x)  with  ed  \  unpacked  {.)x.{e'^) 

I  0  I  error  |  fix  :  x.ed  \ede'^\x\  {ed,  e'fj  \  proj,-  ed  \  inj,-  ed  \  case{ed,  x.e'^,  x.e'^)  \  foid  ed  \  unfoid  ed 
I  ref  ed  \  ed  :=  e'^  \  led  \  I  \  Aa:  k.ed  |  x  |  fix  x  :  x.ed 
I  unify  T  return  (.x)  with  {fid.T'  i-g  e'fij 

S  ::=  ietstaticx  =  •  in  e'  |  ietstaticx  =  S  in  e'  |  A{K).$  |  Xx  :  x.S  |  unpack  ed  (•)-^-(S) 

I  case(erf,  x.S,  x.ef)  \  ca.se{ed,  x.ed,  x.S)  |  Aa  :  k.$  |  fix  x  :  x.S  |  unify  T  return  (.x)  with  (*F.r'  i-g  §) 

I  £.[S] 

£,  ::=  Ls  T  \  pack  T  return  (.x)  with  £^  |  unpack  £^  {■)x.{e')  |  e'  |  ed  \  (£s,  e)  \  {ed,  £0  |  proj,-  £^ 

I  inj;  £^  I  case(£i,  x.ei,x.e2)  \  foid  Ls  \  unfoid  Es  \  ref  Es  \  Es  :=  e'  \  ed  :=  Es  \  !£i  |  Es  x 

£  ::=  •  I  £  r  I  pack  T  return  (.x)  with  £  |  unpack  £  {.)x.{ed)  |  £  |  v  £  |  (£,  |  (v,  £)  |  proj,-  £  |  inj,-  £ 

I  case(£,  x.e^,  x.e'^)  \  foid  £  |  unfoid  £  |  ref  £  |  £  :=  |  v  :=  £  |  !£  |  £  x 

p  ::=  •  \  p,  l^v 


{h,e) — : 

e' 

)  error) 

{h,ed) 

— ^  (f  , 

e'd) 

(f , §70 ) 

— (f' 

,  §70 ) 

{p,  ed)  — >  error 
{p,  §[ed]  )  — error 


{p  ,  §[ietstaticx  =  V  in  e] )  — >s  {p  ,  S[^[v/x]]  ) 


( p  ,  ietstatic  X  =  vine)  — {p  ,  e[v/x]) 


Most  lemmas  are  trivial  to  adapt.  We  adapt  the  substitution  lemma  for  eomputational  terms  below. 

Lemma  E.7  (Substitution)  1.  If  '¥,  'F';  F,  a'  :  X',  F'  h  x  :  X  and  'F;  F  h  x'  :  X'  then  'F,  'F';  F,  F'[x7a']  h 
x[x7a'] :  X. 
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2.  //'F,  'F';  I  r,  a' :  r ,  r  h  e  :  X  and  'F;  T  h  x' :  A:'  then  'F,  'F';  I;  T,  T'[x' /a']  h  efx'/a']  :  x[x7a']. 

3.  //'F,  'F';  I  r,  x' :  x',  T  h  :  x  anJ  'F;  I;  T  h  :  x'  then  'F,  'F';  I;  T,  T'  h  :  X. 

4.  r,  X  X,  r'  h  e  :  x'  ant/  •;  Z;  •  h  v  :  X  then  *F;  Z;  F,  F'  h  e[v/x]  :  x'. 

5.  //'*F;  F,  X  :  X,  F'  h  e  :  x'  a?iJ  •;  Z;  •  h  v  :  X  then  'F;  Z;  F,  F'  h  e[v/x]  :  x'. 

Easy  proof  by  structural  induction  on  the  typing  derivation  for  e.  We  prove  the  interesting  cases  below: 


Part  3.  Case 


,XGF 


l> 


'F;  Z;  Fhx:x 

We  have  that  ed[e'^/x]  =  e'^,  and  *F;  Z;  F  h  :  x,  which  is  the  desired. 


•  ;  Z;  Fistatic  F  e  :  X  'F;  Z;  F,  x  x  h  e  :  x 

Case - ; — ; -  > 

*F;  Z;  Fh  letstaticx  =  e  in  e  :  x 

Impossible  case,  because  the  theorem  only  has  to  do  with  ed  cases. 
Part  4.  Most  cases  are  trivial.  The  only  interesting  case  follows. 


•  ;  Z;  Fistatic, -x  :  X,  F' I  static  F  e  :  X  'F;  Z;  F,  x  x,  F',  x' x"  F  e' :  x' 

C  _ 

*F;  Z;  F,  x  X,  F'  F  letstaticx'  =  e  in  e' :  x' 

We  use  part  5  for  e  to  get  that  •;  Z;  F| static,  C'j static  F  e[v/x]  :  X. 

By  induction  hypothesis  for  e'  we  get  *F;  Z;  F,  F',x'  x"  F  e'[v/x]  :  x'. 

Thus  using  the  same  typing  rule  we  get  the  desired  result. 

Part  5.  Trivial  by  structural  induction. 


Lemma  E.8  (Types  of  decompositions)  1.  7/'*F;  Z;  F  F  S[e]  :  x  with  T\static  =  •,  then  there  exists  x'  such  that 
•  ;  Z;  •  F  e  :  x'  and  for  all  e'  such  that  •;  Z;  •  F  e' :  x',  we  have  that  *F;  Z;  F  F  S[e']  :  x. 

2.  //'F;  Z;  F  F  [e]  :  x  then  there  exists  x'  such  that  *F;  Z;  F  F  e  :  x'  and  for  all  e'  such  that  *F;  Z;  F  F  e' :  x',  we 
have  that  *F;  Z;  F  F  £i[e']  :  X. 

Part  1.  By  structural  induction  on  S. 

Case  §  =  letstaticx  =  •  in  e'  [>  By  inversion  of  typing  we  get  that  •;  Z;  Fjstatic  F  e  :  x.  We  have  F|static,  thus 
we  get  •;  Z;  •  F  e  :  x'.  Using  the  same  typing  rule  we  get  the  desired  result  for  §[e']. 

Case  §  =  letstatic  x  =  S'  in  e"  >  By  inductive  hypothesis  for  S'  we  get  the  desired  directly. 

Case  S  =  A(.^f).S'  >  We  have  that  [S'[ei;]]  =  S"[[e]]  with  S"  =  [S'[«]].  By  inductive  hypothesis  for 
*F,  Z;  F  F  S"[[e]]  :  x  we  get  that  •;  Z;  •  F  [e]  :  x'.  From  this  we  directly  get  [e]  =  e,  and  the  desired 
follows  immediately  (using  the  rest  of  the  inductive  hypothesis). 

Case  S  =  £i[S]  >  We  have  that  'F;  Z;  F  F  £i[S[erf]]  :  x.  Using  part  2  for  and  $[ed\  we  get  that  *F;  Z;  F  F 
ii[ed]  ■  x'  for  some  x'  and  also  that  for  all  e'  such  that  'F;  Z;  F  F  e  :  x',  'F;  Z;  F  F  £i[e']  :  X.  Then  using  induction 
hypothesis  we  get  a  x"  such  that  •;  Z;  •  F  :  x".  For  this  type,  we  also  have  that  •;  Z;  •  F  :  x"  implies 
*F;  Z;  F  F  2>[e'f\  :  x',  which  further  implies  *F;  Z;  F  F  £i[S[e7]  • 

The  rest  of  the  cases  follow  similar  ideas. 
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Part  2.  By  induction  on  the  structure  of  In  each  case,  use  inversion  of  typing  to  get  the  type  for  e,  and  then 
use  the  same  typing  rule  to  get  the  derivation  for  [e'] . 

Theorem  E.9  (Preservation)  1 .  If  •;  L;  •  \-  e  :  %  fj  L,  {fj  ,  e)  — >s  {ti'  ,  e' )  then  there  exists  l!  such  that 

Z  C  fj  ~  Z'  and  •;  Z';  •  h  e' :  x. 

2.  If  •;  Z;  •  h  Cd  :  %  p  Z,  { jj  ,  ed  )  — { /a'  ,  e’j  )  then  there  exists  Z'  such  that  Z  C  Z',  /r'  ~  Z'  and 

•  ;  Z';  •  h  4  :  X. 

Part  1  We  proceed  by  induction  on  the  derivation  of  ( /r ,  e  )  — )■  ( /r'  ,  e' ) . 

Case  ( ^  ,e'd)  ^ 

Using  the  lemma  E.8  we  get  •;  Z;  •  h  :  x'.  Using  part  2,  we  get  that  •;  Z;  •  h  :  x'.  Thus,  using  again  the 
same  lemma  we  get  the  desired. 

Case  (p  ,  S[letstatic.r  =  v  in  e]  )  — (/r ,  S[e[v/x]]  )  > 

Using  the  lemma  E.8  we  get  •;  Z;  •  h  letstatic  .r  =  v  in  e  :  x'.  By  typing  inversion  we  get  that  •;  Z;  •  h  v  :  x", 
and  also  that  •;  Z;  x  x"  h  e  :  x'.  Using  the  substitution  lemma  E.7  we  get  the  desired  result. 

The  rest  of  the  cases  are  trivial. 

Part  2  Proceeds  exactly  as  before,  as  Cd  entirely  matches  the  definition  of  expressions  prior  to  the  extension. 


Theorem  E.IO  (Unique  decomposition)  I.  For  every  expression  e,  we  have: 

(a)  Either  e  is  a  dynamic  expression  ed,  in  which  case  there  is  no  way  to  write  ed  as  $\e'\for  any  f . 

(b)  Or  there  is  a  unique  decomposition  of  e  into  $[ed\- 
2.  For  every  expression  ed,  we  have: 

(a)  Either  it  is  a  value  v  and  the  decomposition  v  =  8,[e]  implies  £  =  •  and  e  =  v. 

(b)  Or,  there  is  a  unique  decomposition  of  Cd  into  Cd  =  £[v]. 

Part  1.  Proceed  by  induction  on  the  structure  of  the  expression  e. 

Case  A(.^f).e'  >  By  induction  hypothesis  on  the  structure  of  e' .  If  we  have  e'  =  Cd,  then  this  is  a  dynamic 
expression  already.  In  the  other  cases,  we  get  a  unique  decomposition  of  e'  into  S'  [e”] .  The  original  expression 
e  can  be  uniquely  decomposed  using  S  =  K{K).2)' ,  with  e  =  S[e"].  This  decomposition  is  unique  because  the 
outer  frame  is  uniquely  determined;  if  the  inner  frames  or  the  expression  filling  fhe  hole  could  be  differenf,  we 
would  violafe  fhe  uniqueness  parf  of  fhe  decomposition  refurned  by  inducfion  hypofhesis. 

Case  e  T  \>  By  inducfion  hypofhesis  we  gef  fhaf  eifher  e'  =  e'^,  or  fhere  is  a  unique  decomposition  of  e'  info 
S'[e"].  In  fhaf  case,  e  is  uniquely  decomposed  using  S  =  £^[8']  wifh  E-s  =  •  T,  info  e  =  S'[e"]  T. 

Case  unpackx  {.)e' .{e")  >  By  inducfion  hypofhesis  on  e'\  if  if  is  a  dynamic  expression,  fhen  by  inducfion 
hypofhesis  on  e";  if  fhaf  too  is  a  dynamic  expression,  fhen  fhe  original  expression  is  too.  Ofherwise,  use 
fhe  unique  decomposition  of  e"  =  S'[e'"]  to  gef  fhe  unique  decomposition  e  =  unpack  x  {.)ed-{E>'[e"']).  If  e' 
is  nof  a  dynamic  expression,  use  fhe  unique  decomposifion  of  e'  =  S"[e""]  fo  gef  fhe  unique  decomposifion 
e  =  unpackx(.)S"[e""].(e"). 
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Case  letstatic x  =  e  m  e'  X>  By  induction  hypothesis  on  e! . 

In  the  case  that  e'  =  ed,  then  trivially  we  have  the  unique  decomposition  e  =  (letstatic  x  =  •In  e'')[ed\- 
In  the  case  where  e'  =  §[ed],  we  have  the  unique  decomposition  e  =  (letstatic  x  =  S  in  e'')[ed\- 
The  rest  of  the  cases  are  similar. 

Part  2.  Trivial  by  induction  on  the  structure  of  the  dynamic  expression  ed- 


Theorem  E.ll  (Progress)  1.  If*-,  Z;  •  h  e  :  X  and q  ~  Z,  then  either  q,  e  — error,  or  e  is  a  dynamic  expression 
ed,  or  there  exist  q'  and  e'  such  that  q,  e  — q',  e'. 

2.  If*-,  Z;  *\-  Cd  --1  and  q  ~  Z,  then  either  q,  ed  — error,  or  ed  is  a  value  v,  or  there  exist  p'  and  e'^  such  that 
q,  ed  — >  q',  e'^. 

Part  1  First,  we  use  the  unique  decomposition  lemma  E.IO,  we  get  that  either  e  is  a  dynamic  expression,  in 
which  case  we  are  done;  or  a  decomposition  into  i>[e'f\.  In  that  case,  we  use  the  lemma  E.8  and  part  2  to  get 
that  either  e'^  is  a  value  or  that  some  progress  can  be  made:  either  by  failing  or  getting  a  q^e((,  in  which  case 
we  use  the  appropriate  rule  for  — >s  either  to  fail  or  to  progress  to  q',S[e((].  If  e'^  is  a  value,  then  we  split  cases 
depending  on  S  -  if  it  is  simply  letstatic  x  =  •  in  e  or  it  is  nested.  In  both  cases  we  make  progress  using  the 
appropriate  step  rule. 

Part  2  Identical  as  before. 

F.  Collapsing  terms  with  extension  variables  into  terms  with  normal  variables 

Definition  F.l  A  decidable  judgement  for  deciding  whether  a  term  t,  a  context  <I>,  etc.  are  collapsable  to  a 
normal  logical  term  is  given  below. 

Intuitively,  it  defines  as  collapsable  terms  where  all  context  <I>  involved  (even  inside  extension  variable  types) 
are  subcontexts  of  a  single  context  <t>"  (which  is  the  result  of  the  procedure),  and  all  extension  variables  are  used 
with  identity  substitutions  of  that  context. 

collapsible  (*T)  o  <!>'  i>  <I>" 


collapsible  (*T)  <]<!>'  > <I>"  collapsible  {K)  < <I>"  i> 
collapsible  (•)  collapsible  (*T,  .S') 


collapsible  (.^f  )<]<!>'  i><I>" 


K  =  T  collapsible  (r)<]<I>'i><l>" 
collapsible  (.^f)  <i<I>'  oO" 


collapsible  (<1>)  <i<I>'i><l>" 
collapsible  ([<!>]  ctx)  <]<!>'  i><I>" 


collapsible  (T  )<]<!>'  i>  <1>" 


collapsible  (<I>)  collapsible  (t) 

collapsible  ( [<1>]  t )  <i  <!>'  i>  <I>" 


collapsible  (<I>i,  <I>2)<i<I>'i><I>" 
collapsible  ([<I>i]<I>2)  ><I>" 
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collapsible  (O)  oO" 


collapsible  <i)  =  <i>"  collapsible  (f)  <i<I> 

collapsible (•)<i<I>'i><I>'  collapsible  (<I>,  >(<!>,  t) 

collapsible  (<I>)<i<I>'><I>"  <I>  c  <I>"  collapsible  (f)  collapsible  <i)  =  <i)" 

collapsible  (<I>,  collapsible  (<I>,  >(<!>,  X,) 

collapsible  (<I>)  <]<!>' ><!>"  <I>  C  <I>" 

collapsible  (<I>,  X,)  <]<!>' 


collapsible  (f)<i  O' 


collapsible  (5)  <1  O'  collapsible  (c)<iO'  collapsible  (/i)  <iO' 


collapsible  (?i)  <]0'  collapsible  ( Itj] )  <]0' 

collapsible  ).t2)<  O' 


collapsible  (fi)  oO'  collapsible  {\t2] )  <iO' 
collapsible  (n(fi)i2)  <iO' 


collapsible  (fi )  <iO'  collapsible  (f2)  <iO' 
collapsible  {t\  f2)  <iO' 


collapsible  (fi)  oO'  collapsible  (f2)  <]0' 
collapsible  {tl=  f2)  <JO' 


a  c  idO' 

collapsible  (X, /a)  <]  O' 


collapsible (*F  h  r  :^)i>0' 


collapsible  (*F)  <i»>0'  collapsible  (^)  <]0'i>0"  collapsible  (r)  <]0"t>0" 

collapsible (*F  h  r  :^)>0" 


collapsible  h  O  wf)  oO' 

collapsible  (*F)  <i«  i>0'  collapsible  (O)  -oO'  >0" 
collapsible  h  O  wf)  oO" 

Lemma  F.2  1.  If  collapsible  then  either  O'  C  O  and  O"  =  O,  or  O  C  O'  and  O"  =  O'. 

2.  If  collapsible  (*L  h  [O]  f :  [O]  fr)  >  O'  then  O  C  O'. 

3.  If  collapsible  h  [Oq]  Oi  :  [Oo]  Oi)  oO'  then  Oo,Oi  C  O'. 

Trivial  by  structural  induction. 

Lemma  F.3  I.  If  collapsible  (*F)  <1  •  i>  O  and  O  C  O'  then  collapsible  (*F)  <1  O'  >  O'. 

2.  If  collapsible  (.^T)  <1  •  >  O  and  O  C  O'  then  collapsible  (.^T)  <1  O'  >  O'. 

3.  If  collapsible  (T)  <1  •  i>  O  and  O  C  O'  then  collapsible  (T)  <]  O'  i>  O'. 

4.  If  collapsible  (Oq)  <1  •  >  O  and  O  C  O'  then  collapsible  (Oq)  <1  O'  >  O'. 
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Trivial  by  structural  induction  on  the  collapsing  relation. 


Lemma  F,4  lf\-  wf  and  collapsible then  there  exist  ajj;,  a|,,  and  a  ^  such  that: 

'Ll  h  :  'T, 

•  h  a|, ;  xpi, 

'Ll  h  <Li  wf, 

'T‘;  <I>i  hai  :<I>°-a4-. 

'T;  <I>°ha  1 

for  all  t  such  that  *T;  <I>*^  h  t  :  t',  we  have  t  *  CT^  *  CT^  *  cy^  *  (5  ^  =t,  and  all  members  of'¥^  are  of  the  form  [<!>*]  t 
where  <L*  C  <I)^ 


By  induction  on  the  derivation  of  the  relation  collapsible  (*T)  <]•  i><L°. 


Case  =  •  > 

We  choose  =  •;  =  a|,  =  •;  <I>^  =  •;  =  •;  =  •  <l)i  =  •  and  the  desired  trivially  hold. 

Case'T  =  'T',  [<I>]ctx  D> 

From  the  collapsable  relation,  we  get:  collapsible  (*F')  <i  •  i>  collapsible  ([<!>]  ctx) -d  <!>''’>  By  induction 
hypothesis  for  *F',  get: 

'F'l  h  :  »F', 

•  Ha(2 
'F'l  h  wf, 

O'l  ha'i 
'F'; 

for  all  t  such  that  *F';  \-  t  :t',  we  have  t  •  =  t,  and  all  members  of  *F'^  are  of  the 

form  [<!>*]  t  where  <F*  C  <J>'^ 

By  inversion  of  typing  for  [<F]  ctx  we  get  that  *F'  h  <I>  wf. 

We  fix  [<!>'*’  •  a(^]  which  is  a  valid  choice  as  long  as  we  select  *F^  so  that  C  'F^  This  substitution 

has  correct  type  by  taking  into  account  the  substitution  lemma  for  O'**  and  a(^. 

For  choosing  the  rest,  we  proceed  by  induction  on  the  derivation  of  <!>'*’  C 
If<I)0  =  <I)'0,then: 

We  have  <F  C  <!>'*’  because  of  the  previous  lemma. 

Choose  *F'  =  'F'^  ;  ;  <I>^  =  <I>'^  ;  =  a''  ; 

Everything  holds  trivially,  other  than  ajj,  typing.  This  too  is  easy  to  prove  by  taking  into  account  the 
substitution  lemma  for  <F  and  a^.  Also,  typing  uses  extension  variable  weakening.  Last,  for  the 
cancellation  part,  terms  that  are  typed  under  *F  are  also  typed  under  *F'  so  this  part  is  trivial  too. 

If  =  <!>'*’,  t,  then:  (here  we  abuse  things  slightly  -  by  identifying  the  context  and  substitutions  from  induction 
hypothesis  with  the  ones  we  already  have:  their  properties  are  the  same  for  the  new  <!>'*’) 

We  have  <!>  =  <!>*’  =  t  because  of  the  previous  lemma  (<I>°  is  not  thus  <!>*’  =  <I>). 

First,  choose  <I>^  =  <I>'^ ,  t  •  This  is  a  valid  choice,  because  *F';  <I>'°  h  t :  5;  by  applying  we 

get  *F'^ ;  <!>''’  •  h  t  •  :  5;  by  applying  we  get  *F'^ ;  h  t  •  :  5. 

Thus  *F'^  h  f  •  wf  (and  the  *F^  we  will  choose  is  supercontext  of  *F'^). 

Now,  choose  'F^  =  *F'^ ,  t  •  This  is  well-formed  because  of  what  we  proved  above  about 

the  substituted  t,  taking  weakening  into  account.  Also,  the  condition  for  the  contexts  in  *F^  being 
subcontexts  of  <I>^  obviously  holds. 
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Choose  a|,  =  /jd)'! |  ■  We  have  •  h  a|, :  direetly  by  our  eonstruetion. 

Choose  /jo'*  |  •  We  have  that  this  latter  term  ean  be  typed  as  ;  <!>'  h  /j<j)'i  |  \  t  ■  •  a'' ,  and 

thus  we  have  h  :  O'*’  •  a(p,  t  ■  a(p. 

Choose  whieh  is  typed  eorreetly  sinee  t  •  •  a'*  •  =  t.  Last,  assume 

'L;  O'*’,  the:  t'.  We  prove  t*  •  •  a’  •  a|,  •  =  4. 

First,  4  is  also  typed  under  *F'  beeause  4  eannot  use  the  newly-introdueed  variable  direetly 
(even  in  the  ease  where  it  would  be  part  of  Oq,  there’s  still  no  extension  variable  that  has 
X|vj</|  in  its  eontext). 

Thus  it  suffiees  to  prove  4  •  •  a’  •  a|,  •  =4. 

Then  proeeed  by  struetural  induetion  on  t*.  The  only  interesting  ease  oeeurs  when  t*  = 

/|<j>/0|,  in  whieh  ease  we  have: 

/|<j)'0|  ■  tJip  •  Cj’  •  •  CT  ’  =  /|<j)'0.(jn I  •  Cj’  •  •  CT  ^  =  /jo'* |  '  •  CJ  ’  =  /|(j,/i.jj2^|  •  CJ  ’  =  /jd)* 

IfO*’  =  0'*’,  Xf. 

By  well-formedness  inversion  we  get  that  *F./  =  [0*]ctx,  and  by  repeated  inversions  of  the  eol- 
lapsable  relation  we  get  O*  C  O'*’. 

Choose  o’  =  O'*  ;  =  *F'’  ;  a|,  =  a’  =  a'*;  =  a'^’. 

Most  desiderata  are  trivial.  For  a’,  note  that  (O'* ,  X,)  •  =  O'*  •  sinee  by  eonstruetion  we  have 

that  always  substitutes  parametrie  eontexts  by  the  empty  eontext. 

For  eaneellation,  we  need  to  prove  that  for  all  t  sueh  that  *F;  O'*’,  Xi  h  t*  :  t',  we  have  t*  •  •  a’  • 

a|,  •  =4.  This  is  proved  direetly  by  notieing  that  4  is  typed  also  under  *F'  (if  Xi  was  the  just- 

introdueed  variable,  it  wouldn’t  be  able  to  refer  to  itself). 

Case'F  =  'F',  [0]t  > 

From  the  eollapsable  relation,  we  get:  collapsible  (*F')  <i«i>0'*’,  collapsible  (0)<]0'*’c>0*’,  collapsible  (t)<iO*’.  By 
induction  hypothesis  for  *F',  get: 

'F'’  h  0(1  :  »F', 

•  Fa(2  :»F'’, 

'F'’  h  O'*  wf, 

'F'’;  O'l  ha'’  :0'*’-a(J, 

'F';  O'”  ha'-’  :0'’-a(^, 

for  all  t  such  that  *F';  O'”  h  t  :  t',  we  have  t  •  a(^  •  a'’  •  a^  •  a'-’  =  t,  and  all  members  of  *F'’  are  of  the 
form  [O*]  t  where  O*  C  O'’. 

Also  from  typing  inversion  we  get:  *F'  h  O  wf  and  'F';  O  h  t  :  5. 

We  proceed  similarly  as  in  the  previous  case,  by  induction  on  O'”  C  O”,  in  order  to  redefine  *F'’ ,  a(^,  a(^.  O'’ ,  a'’ ,  a 
with  the  properties: 

'F'’  h  a^i  :  'F', 

•  ha;^:^’, 

'F'’  h  O'’  wf, 

'F'’;  O'’  ha'’  :0”-a(^, 

'F';  0”ha'-’  :0'’-a;^, 

for  all  t  such  that  *F';  O”  h  t :  t' ,  we  have  f  •  a(^  •  a'’  •  a(^  •  a'-’  =  t,  and  all  members  of  *F'’  are  of  the 
form  [O*]  t  where  O*  C  O'’. 

Now  we  have  O  C  O”  thus  *F';  O”  h  t  :  5. 

By  applying  a(^  and  then  a'’  to  t  we  get  *F'’ ;  O'’  h  t  •  a(^  •  a'’  \  s.  We  can  now  choose  O’  =  O'’ ,  t  •  a(^  •  a'’ . 
Choose  *F’  =  *F'’ ,  [O’]  t  •  a(^  •  a'’ .  It  is  obviously  well-formed. 

Now,  will  choose  a]j,: 
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Need  to  ehoose  sueh  that  ;  <I>  •  h  t ^  \t  ■  a(p. 

Assuming  =  A|<j,/i|/a,  we  need  t  •  •  a  =  t  •  and  <I>  •  h  a  : 

Thus,  what  we  require  is  the  inverse  of  a'.  By  eonstruetion,  there  exists  sueh  a  a,  beeause  is  just 

a  variable  renaming.  (Note  that  this  is  different  from 

Therefore,  set  [<I>]  A|(J>/i|/ct,  whieh  has  the  desirable  properties. 

Choose  a|,  =  .  We  trivially  have  •  h  a|, : 

Choose  with  typing  holding  obviously. 

Choose  Anj</|/id<I>.  Consider  the  eaneellation  faet;  typing  is  then  possible. 

It  remains  to  prove  that  for  all  4  sueh  that  [<I>]  f ;  <!>*’  h  t*  :  t',  we  have  t  •  •  a|,  •  =t. 

This  is  done  by  struetural  induetion  on  4,  with  the  interesting  ease  being  4  =  A|vj</|/a*.  By  inversion  of  eol- 
lapsable  relation,  we  get  that  a*  =  id<I>. 

Thus  (A|>j</|/id<I>)  =  (A|<j>,i|/a)  •  (id<I>-a^)  =  (A|<j,/i|/a)-  (idO-a^)  = 

(A|<„,i|/a)  =  (A|(„,i|/(a-a^))  =  (A|<j>,i|/(id<I>^))  -al, =  (/jo'i|  •  (id<I>^  -a^))  = 

-idO^  -a-i  =  A|>i„|/id<I>. 

Theorem  F.5  lf^\-  [<!’]  t  :  [<!’]  h  and  collapsible {'¥  h  [<I>]  t  :  [<I>]  tj)  =  <!>*,  then  there  exist  <!>',  t' ,  t'j  and  a  such 
thaf'r^'wf  •h[<^']t' :  [O']^,  'T;  <I>  h  a  :  <!>',  t'  ■  a  =  t  and  t'j- ■  o  =  tj. 

Easy  to  prove  using  above  lemma.  Set  <!>'  =  <I>C  a^,  t'  =  t  •  ajj,  •  •  a^,  tj  =  tj  ■  ■  a^,  and  also  set 
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A  Case  for  Behavior-Preserving  Actions  in 
Separation  Logic 


David  Costanzo  and  Zhong  Shao 
Yale  University 


Abstract.  Separation  Logic  is  a  widely-used  tool  that  allows  for  local 
reasoning  about  imperative  programs  with  pointers.  A  straightforward 
definition  of  this  “local  reasoning”  is  that,  whenever  a  program  runs 
safely  on  some  state,  adding  more  state  would  have  no  effect  on  the  pro¬ 
gram’s  behavior.  However,  for  a  mix  of  technical  and  historical  reasons, 
local  reasoning  is  defined  in  a  more  subtle  way,  allowing  a  program  to 
lose  some  behaviors  when  extra  state  is  added.  In  this  paper,  we  propose 
strengthening  local  reasoning  to  match  the  straightforward  definition 
mentioned  above.  We  argue  that  such  a  strengthening  does  not  have  any 
negative  effect  on  the  usability  of  Separation  Logic,  and  we  present  four 
examples  that  illustrate  how  this  strengthening  simplifies  some  of  the 
metatheoretical  reasoning  regarding  Separation  Logic.  In  one  example, 
our  change  even  results  in  a  more  powerful  metatheory. 


1  Introduction 

Separation  Logic  [8, 13]  is  widely  used  for  verifying  the  correctness  of  C-like 
imperative  programs  [9]  that  manipulate  mutable  data  structures.  It  supports 
local  reasoning  [15]:  if  we  know  a  program’s  behavior  on  some  heap,  then  we 
can  automatically  infer  something  about  its  behavior  on  any  larger  heap.  The 
concept  of  local  reasoning  is  embodied  as  a  logical  inference  rule,  known  as  the 
frame  rule.  The  frame  rule  allows  us  to  extend  a  specification  of  a  program’s 
execution  on  a  small  heap  to  a  specification  of  execution  on  a  larger  heap. 

For  the  purpose  of  making  Separation  Logic  extensible,  it  is  common  practice 
to  abstract  over  the  primitive  commands  of  the  programming  language  being 
used.  By  “primitive  commands”  here,  we  mean  commands  that  are  not  defined 
in  terms  of  other  commands.  Typical  examples  of  primitive  commands  include 
variable  assignment  x  :=  E  and  heap  update  [E]  :=  E' .  One  example  of  a 
non-primitive  command  is  while  B  do  C . 

When  we  abstract  over  primitive  commands,  we  need  to  make  sure  that  we 
still  have  a  sound  logic.  Specifically,  it  is  possible  for  the  frame  rule  to  become 
unsound  for  certain  primitive  commands.  In  order  to  guarantee  that  this  does  not 
happen,  certain  “healthiness”  conditions  are  required  of  primitive  commands.  We 
refer  to  these  conditions  together  as  “locality,”  since  they  guarantee  soundness 
of  the  frame  rule,  and  the  frame  rule  is  the  embodiment  of  local  reasoning. 

As  one  might  expect,  locality  in  Separation  Logic  is  defined  in  such  a  way  that 
it  is  precisely  strong  enough  to  guarantee  soundness  of  the  frame  rule.  In  other 
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words,  the  frame  rule  is  sound  if  and  only  if  all  primitive  commands  are  local. 
In  this  paper,  we  consider  a  strengthening  of  locality.  Clearly,  any  strengthening 
will  still  guarantee  soundness  of  the  frame  rule.  The  tradeoff,  then,  is  that  the 
stronger  we  make  locality,  the  fewer  primitive  commands  there  will  be  that  satisfy 
locality.  We  claim  that  we  can  strengthen  locality  to  the  point  where:  (1)  the 
usage  of  the  logic  is  unaffected  —  specifically,  we  do  not  lose  the  ability  to  model 
any  primitive  commands  that  are  normally  modeled  in  Separation  Logic;  (2)  our 
strong  locality  is  precisely  the  property  that  one  would  intuitively  expect  it  to 
be  —  that  the  behavior  of  a  program  is  completely  independent  from  any  unused 
state;  and  (3)  we  significantly  simplify  various  technical  work  in  the  literature 
relating  to  metatheoretical  facts  about  Separation  Logic.  We  refer  to  our  stronger 
notion  of  locality  as  “behavior  preservation,”  because  the  behavior  of  a  program 
is  preserved  when  moving  from  a  small  state  to  a  larger  one. 

We  justify  statement  (1)  above,  that  the  usage  of  the  logic  is  unaffected, 
in  Section  3  by  demonstrating  a  version  of  Separation  Logic  using  the  same 
primitive  commands  as  the  standard  one  presented  in  [13],  for  which  our  strong 
locality  holds.  We  show  that,  even  though  we  need  to  alter  the  state  model  of 
standard  Separation  Logic,  we  do  not  need  to  change  any  of  the  inference  rules. 
We  justify  the  second  statement,  that  our  strong  locality  preserves  program 
behavior,  in  Section  2.  We  will  also  show  that  the  standard,  weaker  notion  of 
locality  is  not  behavior-preserving.  We  provide  some  justification  of  the  third 
statement,  that  behavior  preservation  significantly  simplifies  Separation  Logic 
metatheory,  in  Section  5  by  considering  four  specific  examples  in  detail.  As  a 
primer,  we  will  say  a  little  bit  about  each  example  here. 

The  first  simplification  that  we  show  is  in  regard  to  program  footprints,  as 
defined  and  analyzed  in  [12].  Informally,  a  footprint  of  a  program  is  a  set  of 
states  such  that,  given  the  program’s  behavior  on  those  states,  it  is  possible  to 
infer  all  of  the  program’s  behavior  on  all  other  states.  Footprints  are  useful  for 
giving  complete  specifications  of  programs  in  a  concise  way.  Intuitively,  locality 
should  tell  us  that  the  set  of  smallest  safe  states,  or  states  containing  the  minimal 
amount  of  resources  required  for  the  program  to  safely  execute,  should  always 
be  a  footprint.  However,  this  is  not  the  case  in  standard  Separation  Logic.  To 
quote  the  authors  in  [12],  the  intuition  that  the  smallest  safe  states  should  form 
a  footprint  “fails  due  to  the  subtle  nature  of  the  locality  condition.”  We  show 
that  in  the  context  of  behavior-preserving  locality,  the  set  of  smallest  safe  states 
does  indeed  form  a  footprint. 

The  second  simplification  regards  the  theory  of  data  refinement,  as  defined 
in  [6] .  Data  refinement  is  a  formalism  of  the  common  programming  paradigm  in 
which  an  abstract  module,  or  interface,  is  implemented  by  a  concrete  instantia¬ 
tion.  In  the  context  of  [6],  our  programming  language  consists  of  a  standard  one, 
plus  abstract  module  operations  that  are  guaranteed  to  satisfy  some  specifica¬ 
tion.  We  wish  to  show  that,  given  concrete  and  abstract  modules,  and  a  relation 
relating  their  equivalent  states,  any  execution  of  the  program  that  can  happen 
when  using  the  concrete  module  can  also  happen  when  using  the  abstract  one. 
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We  simplify  the  data  refinement  theory  by  eliminating  the  need  for  two  some¬ 
what  unintuitive  requirements  used  in  [6],  called  contents  independence  and 
growing  relations.  Contents  independence  is  a  strengthening  of  locality  that  is 
implied  by  the  stronger  behavior  preservation.  A  growing  relation  is  a  technical 
requirement  guaranteeing  that  the  area  of  memory  used  by  the  abstract  mod¬ 
ule  is  a  subset  of  that  used  by  the  concrete  one.  It  turns  out  that  behavior 
preservation  is  strong  enough  to  completely  eliminate  the  need  to  require  grow¬ 
ing  relations,  without  automatically  implying  that  any  relations  are  growing. 
Therefore,  we  can  prove  refinement  between  some  modules  (e.g.,  ones  that  use 
completely  disjoint  areas  of  memory)  that  the  system  of  [6]  cannot  handle. 

Our  third  metatheoretical  simplification  is  in  the  context  of  Relational  Sepa¬ 
ration  Logic,  defined  in  [14].  Relational  Separation  Logic  is  a  tool  for  reasoning 
about  the  relationship  between  two  executions  on  different  programs.  In  [14], 
soundness  of  the  relational  frame  rule  is  initially  shown  to  be  dependent  on  pro¬ 
grams  being  deterministic.  The  author  presents  a  reasonable  solution  for  making 
the  frame  rule  sound  in  the  presence  of  nondeterminism,  but  the  solution  is  some¬ 
what  unintuitive  and,  more  importantly,  a  significant  chunk  of  the  paper  (about 
9  pages  out  of  41)  is  devoted  to  developing  the  technical  details  of  the  solution. 
We  show  that  under  the  context  of  behavior  preservation,  the  relational  frame 
rule  as  initially  defined  is  already  sound  in  the  presence  of  nondeterminism,  so 
that  section  of  the  paper  is  no  longer  needed. 

The  fourth  simplification  is  minor,  but  still  worth  noting.  For  technical  rea¬ 
sons,  the  standard  definition  of  locality  does  not  play  well  with  a  model  in  which 
the  total  amount  of  available  memory  is  finite.  Separation  Logic  generally  avoids 
this  issue  by  simply  using  an  infinite  space  of  memory.  This  works  fine,  but  there 
may  be  situations  in  which  we  wish  to  use  a  model  that  more  closely  represents 
what  is  actually  going  on  inside  our  computer.  While  Separation  Logic  can  be 
made  to  work  in  the  presence  of  finite  memory,  doing  so  is  not  a  trivial  matter. 
We  will  show  that  under  our  stronger  notion  of  locality,  no  special  treatment  is 
required  for  finite-sized  models. 

All  proofs  in  Sections  3  and  4  have  been  fully  mechanized  in  the  Coq  proof 
assistant  [7].  The  Coq  source  files,  along  with  their  conversions  to  pdf,  can  be 
found  at  the  link  to  the  technical  report  for  this  paper  [5] . 

2  Locality  and  Behavior  Preservation 

In  standard  Separation  Logic  [8,13,15,4],  there  are  two  locality  properties, 
known  as  Safety  Monotonicity  and  the  Frame  Property,  that  together  imply 
soundness  of  the  frame  rule.  Safety  Monotonicity  says  that  any  time  a  program 
executes  safely  in  a  certain  state,  the  same  program  must  also  execute  safely  in 
any  larger  state  —  in  other  words,  unused  resources  cannot  cause  a  program  to 
crash.  The  Frame  Property  says  that  if  a  program  executes  safely  on  a  small 
state,  then  any  terminating  execution  of  the  program  on  a  larger  state  can  be 
tracked  back  to  some  terminating  execution  on  the  small  state  by  assuming  that 
the  extra  added  state  has  no  effect  and  is  unchanged.  Furthermore,  there  is  a 
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third  property,  called  Termination  Monotonicity,  that  is  required  whenever  we 
are  interested  in  reasoning  about  divergence  (nontermination).  This  property 
says  that  if  a  program  executes  safely  and  never  diverges  on  a  small  state,  then 
it  cannot  diverge  on  any  larger  state. 

To  describe  these  properties  formally,  we  first  formalize  the  idea  of  program 
state.  We  will  describe  the  theory  somewhat  informally  here;  full  formal  detail 
will  be  described  later  in  Section  4.  We  define  states  a  to  be  members  of  an 
abstract  set  S.  We  assume  that  whenever  two  states  (Jq  and  CTi  are  “disjoint,” 
written  (To#cri,  they  can  be  combined  to  form  the  larger  state  cto-cti.  Intuitively, 
two  states  are  disjoint  when  they  occupy  disjoint  areas  of  memory. 

We  represent  the  semantic  meaning  of  a  program  (7  by  a  binary  relation  [C] . 
We  use  the  common  notational  convention  aRb  for  a  binary  relation  R  to  denote 
(a,  6)  €  R.  Intuitively,  ctICJct'  means  that,  when  executing  C  on  initial  state  a, 
it  is  possible  to  terminate  in  state  a' .  Note  that  if  a  is  related  by  IC*]  to  more 
than  one  state,  this  simply  means  that  (7  is  a  nondeterministic  program. 

We  also  define  two  special  behaviors  bad  and  div: 

—  The  notation  crlCIbad  means  that  C  can  crash  or  get  stuck  when  executed 
on  (T,  while 

—  The  notation  (T|(7]div  means  that  (7  can  diverge  (execute  forever)  when 
executed  on  a. 

As  a  notational  convention,  we  use  r  to  range  over  elements  of  AU  {bad,  div}. 
We  require  that  for  any  state  a  and  program  (7,  there  is  always  at  least  one  r  such 
that  cr|(7]r.  In  other  words,  every  execution  must  either  crash,  go  on  forever,  or 
terminate  in  some  state. 

Now  we  can  define  the  properties  described  above  more  formally.  Following 
are  definitions  of  Safety  Monotonicity,  the  Frame  Property,  and  Termination 
Monotonicity,  respectively: 

1. )  ^CTolCIbad  A  CTo#(Ti  ^(cto  •  (Ti)  1(7]  bad 

2. )  ^CTo|(7|bad  A  (ctq  •  ai)lCja'  Ba'g  .  cr'  =  erg  •  cti  A  cro|(7|(Jo 

3. )  ^CTo|(7|badA  ^cro|(7|div  A  cro#cri  ^(erg  •  (Ti)|(7|div 

The  standard  definition  of  locality  was  defined  in  this  way  because  it  is  the 
minimum  requirement  needed  to  make  the  frame  rule  sound  —  it  is  as  weak  as 
it  can  possibly  be  without  breaking  the  logic.  It  was  not  defined  to  correspond 
with  any  intuitive  notion  of  locality.  As  a  result,  there  are  two  subtleties  in  the 
definition  that  might  seem  a  bit  odd.  We  will  now  describe  these  subtleties  and 
the  changes  we  make  to  get  rid  of  them.  Note  that  we  are  not  arguing  in  this 
section  that  there  is  any  benefit  to  changing  locality  in  this  way  (other  than 
the  arguably  vacuous  benefit  of  corresponding  to  our  “intuition”  of  locality)  — 
the  benefit  will  become  clear  when  we  discuss  how  our  change  simplifies  the 
metatheory  in  Section  5. 

The  first  subtlety  is  that  Termination  Monotonicity  only  applies  in  one  di¬ 
rection.  This  means  that  we  could  have  a  program  C  that  runs  forever  on  a 
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state  CT,  but  when  we  add  unused  state,  we  suddenly  lose  the  ability  for  that 
infinite  execution  to  occur.  We  can  easily  get  rid  of  this  subtlety  by  replacing 
Termination  Monoticity  with  the  following  Termination  Equivalence  property: 

^(ToICJbad  A  aQ#ai  =>  (aoJClIdiv  (cro  •  (Ti)|[C]ldiv) 

The  second  subtlety  is  that  locality  gives  us  a  way  of  tracking  an  execution 
on  a  large  state  back  to  a  small  one,  but  it  does  not  allow  for  the  other  way 
around.  This  means  that  there  can  be  an  execution  on  a  state  cr  that  becomes 
invalid  when  we  add  unused  state.  This  subtlety  is  a  little  trickier  to  remedy 
than  the  other.  If  we  think  of  the  Frame  Property  as  really  being  a  “Backwards 
Frame  Property,”  in  the  sense  that  it  only  works  in  the  direction  from  large  state 
to  small  state,  then  we  clearly  need  to  require  a  corresponding  Forwards  Frame 
Property.  We  would  like  to  say  that  if  C  takes  CTo  to  Cg  and  we  add  the  unused 
state  (Ti ,  then  C  takes  ao  ■  cti  to  (Jg  •  cti  : 

a  cro#ai  =>  (o-Q  •  ci)  JC]] (ag  •  ai) 

Unfortunately,  there  is  no  guarantee  that  ctq  •  cti  is  defined,  as  the  states 
might  not  occupy  disjoint  areas  of  memory.  In  fact,  if  C  causes  our  initial  state 
to  grow,  say  by  allocating  memory,  then  there  will  always  be  some  ai  that  is 
disjoint  from  erg  but  not  from  CTq  (e.g.,  take  ai  to  be  exactly  that  allocated 
memory).  Therefore,  it  seems  as  if  we  are  doomed  to  lose  behavior  in  such  a 
situation  upon  adding  unused  state. 

There  is,  however,  a  solution  worth  considering:  we  could  disallow  programs 
from  ever  increasing  state.  In  other  words,  we  can  require  that  whenever  C  takes 
(jQ  to  CTg,  the  area  of  memory  occupied  by  tJg  must  be  a  subset  of  that  occupied 
by  CTg.  In  this  way,  anything  that  is  disjoint  from  (jg  must  also  be  disjoint  from 
(Jg,  so  we  will  not  lose  any  behavior.  Formally,  we  express  this  property  as: 

CoIUIcto  (V(T1  .  (Jo#(Jl  ^  (Jo  #0-1) 

We  can  conveniently  combine  this  property  with  the  previous  one  to  express 
the  Forwards  Frame  Property  as  the  following  condition: 

ao|[C'](To  A  (7o#(Ji  =>  (Jo#(Ji  A  (ao  •  aijJClKao  •  ai) 

At  first  glance,  it  may  seem  imprudent  to  impose  this  requirement,  as  it 
apparently  disallows  memory  allocation.  However,  it  is  in  fact  still  possible  to 
model  memory  allocation  —  we  just  have  to  be  a  little  clever  about  it.  Specif¬ 
ically,  we  can  include  a  set  of  memory  locations  in  our  state  that  we  designate 
to  be  the  “free  list^.”  When  memory  is  allocated,  all  allocated  cells  must  be 
taken  from  the  free  list.  Contrast  this  to  standard  Separation  Logic,  in  which 
newly-allocated  heap  cells  are  taken  from  outside  the  state.  In  the  next  section, 
we  will  show  that  we  can  add  a  free  list  in  this  way  to  the  model  of  Separation 
Logic  without  requiring  a  change  to  any  of  the  inference  rules. 

We  conclude  this  section  with  a  brief  justification  of  the  term  “behavior  preser¬ 
vation.”  Given  that  C  runs  safely  on  a  state  (Jg,  we  think  of  a  behavior  of  C  on 

^  The  free  list  is  actually  a  set  rather  than  a  list;  we  use  the  term  “free  list”  because 
it  is  commonly  used  in  the  context  of  memory  allocation. 
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E  E +  E'  \  E  -  E'  \  E  X  E'  \  ...\  -1\Q\1\  ...\  x\y  \  ... 
B  :■.=  E  =  E'  \  false  \  B  ^  B' 

P,Q  ::=  B  \  false  |  emp  \  Ei-^E'\P^Q  \  'ix.P  \  P  *  Q 
C  ::=  skip  \x-.=  E\x:=[E]\  \E]  ~  E' 

I  x  cons(_Bi, . . . ,  E„)  I  free(i5)  |  C;  C' 

I  if  then  C  else  (7^  |  while  B  do  (7 


Fig.  1.  Assertion  and  Program  Syntax 


(Jq  as  a  particular  execution,  which  can  either  diverge  or  terminate  at  some  state 
(Tg.  The  Forwards  Frame  Property  tells  us  that  execution  on  a  larger  state  erg  •  (Ti 
simulates  execution  on  the  smaller  state  ctq,  while  the  Backwards  (standard) 
Frame  Property  says  that  execution  on  the  smaller  state  simulates  execution  on 
the  larger  one.  Since  standard  locality  only  requires  simulation  in  one  direction, 
it  is  possible  for  a  program  to  have  fewer  valid  executions,  or  behaviors,  when 
executing  on  erg  •  cti  as  opposed  to  just  erg.  Our  stronger  locality  disallows  this 
from  happening,  enforcing  a  bisimulation  under  which  all  behaviors  are  preserved 
when  extra  resources  are  added. 

3  Impact  on  a  Concrete  Separation  Logic 

We  will  now  present  one  possible  RAM  model  that  enforces  our  stronger  notion  of 
locality  without  affecting  the  inference  rules  of  standard  Separation  Logic.  In  the 
standard  model  of  [13],  a  program  state  consists  of  two  components:  a  variable 
store  and  a  heap.  When  new  memory  is  allocated,  the  memory  is  “magically” 
added  to  the  heap.  As  shown  in  Section  2,  we  cannot  allow  allocation  to  increase 
the  program  state  in  this  way.  Instead,  we  will  include  an  explicit  free  list,  or 
a  set  of  memory  locations  available  for  allocation,  inside  of  the  program  state. 
Thus  a  state  is  now  is  a  triple  (s,  /i,  /)  consisting  of  a  store,  a  heap,  and  a  free  list, 
with  the  heap  and  free  list  occupying  disjoint  areas  of  memory.  Newly-allocated 
memory  will  always  come  from  the  free  list,  while  deallocated  memory  goes  back 
into  the  free  list.  Since  the  standard  formulation  of  Separation  Logic  assumes  that 
memory  is  inhnite  and  hence  that  allocation  never  fails,  we  similarly  require  that 
the  free  list  be  infinite.  More  specifically,  we  require  that  there  is  some  location 
n  such  that  all  locations  above  n  are  in  the  free  list. 

Formally,  states  are  defined  as  follows: 

Var  V  =  {x,  y,z,.. .}  Store  S  =  V  Heap  B  =  N  ^  Z 

fin 

Free  List  E  =  {N  e  B(N)  |  3n  .  Vfc  >  n  .  fc  e  A} 

State  r  =  {(s,  h,f)GSx  H  X  F  \  dom{h)  n  /  =  0} 

As  a  point  of  clariheation,  we  are  not  claiming  here  that  including  the  free 
list  in  the  state  model  is  a  novel  idea.  Other  systems  (e.g.,  [12])  have  made  use  of 
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a  very  similar  idea.  The  two  novel  contributions  that  we  will  show  in  this  section 
are:  (1)  that  a  state  model  which  includes  an  explicit  free  list  can  provide  a 
behavior-preserving  semantics,  and  (2)  that  the  corresponding  program  logic  can 
be  made  to  be  completely  backwards-compatible  with  standard  Separation  Logic 
(meaning  that  any  valid  Separation  Logic  derivation  is  also  a  valid  derivation  in 
our  logic). 

Assertion  syntax  and  program  syntax  are  given  in  Figure  1,  and  are  exactly 
the  same  as  in  the  standard  model  for  Separation  Logic. 

Our  satisfaction  judgement  (s,  h,f  )  \=  P  for  an  assertion  P  is  defined  by  ig¬ 
noring  the  free  list  and  only  considering  whether  (s,  h)  satisfies  P.  Our  definition 
of  (s,  h)  \=  P  is  identical  to  that  of  standard  Separation  Logic. 

The  small-step  operational  semantics  for  our  machine  is  defined  as  a,  C  — > 
a',C'  and  is  straightforward;  the  full  details  can  be  found  in  the  extended 
TR.  The  most  interesting  aspects  are  the  rules  for  allocation  and  dealloca¬ 
tion,  since  they  make  use  of  the  free  list,  x  :=  cons(Ai, . . . ,  ill„)  allocates  a 
nondeterministically-chosen  contiguous  block  of  n  heap  cells  from  the  free  list, 
while  free(i?)  puts  the  single  heap  cell  pointed  to  by  E  back  onto  the  free  list. 
None  of  the  operations  make  use  of  any  memory  existing  outside  the  program 
state  —  this  is  the  key  for  obtaining  behavior-preservation. 

To  see  how  out  state  model  fits  into  the  structure  defined  in  Section  2,  we 
need  to  define  the  state  combination  operator.  Given  two  states  ai  =  (si,  hi,  fi) 
and  CT2  =  {s2,h2,  f2),  the  combined  state  cti  •  a2  is  equal  to  {si,hi  l±)  h2,  fi)  if 
Si  =  S2,  fi  =  f2,  and  the  domains  of  hi  and  /12  are  disjoint;  otherwise,  the 
combination  is  undefined.  Note  that  this  combined  state  satisfies  the  requisite 
condition  dom(/ii  W  /12)  H  /i  =0  because  hi,  h2,  and  fi  are  pairwise  disjoint  by 
assumption.  The  most  important  aspect  of  this  definition  of  state  combination 
is  that  we  can  never  change  the  free  list  when  adding  extra  resources.  This  guar¬ 
antees  behavior  preservation  of  the  nondeterministic  memory  allocator  because 
the  allocator’s  set  of  possible  behaviors  is  precisely  defined  by  the  free  list. 

In  order  to  formally  compare  our  logic  to  “standard”  Separation  Logic,  we 
need  to  provide  the  standard  version  of  the  small-step  operational  semantics, 
denoted  as  {s,h),C  {s',h'),C'.  This  semantics  does  not  have  explicit  free 

lists  in  the  states,  but  instead  treats  all  locations  outside  the  domain  of  h  as 
free.  We  formalize  this  semantics  in  the  extended  TR,  and  prove  the  following 
relationship  between  the  two  operational  semantics: 


{s,h),C  hh{s',h'),C'  ^  3f,f'  .{s,h,f),C  ^  {s', h',f'),C' 


The  inference  rules  in  the  form  h  {P}  C  {Q}  for  our  logic  are  same  as  those 
used  in  standard  Separation  Logic.  In  the  extended  TR,  we  state  all  the  inference 
rules  and  prove  that  our  logic  is  both  sound  and  complete;  therefore,  behavior 
preservation  does  not  cause  any  complications  in  the  usage  of  Separation  Logic. 
Any  specification  that  can  be  proved  using  the  standard  model  can  also  be  proved 
using  our  model.  Also  in  the  TR,  we  prove  that  our  model  enjoys  the  stronger, 
behavior-preserving  notion  of  locality  described  in  Sec  2. 

Even  though  our  logic  works  exactly  the  same  as  standard  Separation  Logic, 
our  underlying  model  now  has  this  free  list  within  the  state.  Therefore,  if  we 
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so  desire,  we  could  define  additional  assertions  and  inference  rules  allowing  for 
more  precise  reasoning  involving  the  free  list.  One  idea  is  to  have  a  separate, 
free  list  section  of  assertions  in  which  we  write,  for  example,  E  *  true  to  claim 
that  ill  is  a  part  of  the  free  list.  Then  the  axiom  for  free  would  look  like: 

{E  I— >■  — ;  true}  free(iil)  {emp;  E  *  true} 

4  The  Abstract  Logic 

In  order  to  clearly  explain  how  our  stronger  notion  of  locality  resolves  the 
metatheoretical  issues  described  in  Section  1,  we  will  first  formally  describe  how 
our  locality  fits  into  a  context  similar  to  that  of  Abstract  Separation  Logic  [4]. 
With  a  minor  amount  of  work,  the  logic  of  Section  3  can  be  molded  into  a 
particular  instance  of  the  abstract  logic  presented  here. 

We  define  a  separation  algebra  to  be  a  set  of  states  E,  along  with  a  partial 
associative  and  commutative  operator  ■  :  E  ^  E  ^  E.  The  disjointness  relation 
(Jo#cri  holds  iff  ao-ai  is  defined,  and  the  substate  relation  (Jq  ^  cri  holds  iff  there 
is  some  (Jg  such  that  erg  •  (Jg  =  CTi  .  A  particular  element  of  E  is  designated  as  a 
unit  state,  denoted  u,  with  the  property  that  for  any  a,  and  a  ■  u  =  a.  We 
require  the  •  operator  to  be  cancellative,  meaning  that  a  -ao  =  a  -ai  ctq  =  cti  . 

An  action  is  a  set  of  pairs  of  type  AU  {bad,  div}  x  AU  {bad,  div}.  We  require 
the  following  two  properties:  (1)  actions  always  relate  bad  to  bad  and  div  to  div, 
and  never  relate  bad  or  div  to  anything  else;  and  (2)  actions  are  total,  in  the 
sense  that  for  any  r,  there  exists  some  t'  such  that  tAt'  (recall  from  Section  2 
that  we  use  t  to  range  over  elements  of  A  U  {bad,  div}).  Note  that  these  two 
requirements  are  preserved  over  the  standard  composition  of  relations,  as  well 
as  over  both  finitary  and  infinite  unions.  We  write  Id  to  represent  the  identity 
action  {(r,  r)  |  r  €  A  U  {bad,  div}}. 

Note  that  it  is  more  standard  in  the  literature  to  have  the  domain  of  actions 
range  only  over  A  —  we  use  AU{bad,  div}  here  because  it  has  the  pleasant  effect 
of  making  |C'i;C'2]  correspond  precisely  to  standard  composition.  Intuitively, 
once  an  execution  goes  wrong,  it  continues  to  go  wrong,  and  once  an  execution 
diverges,  it  continues  to  diverge. 

A  local  action  is  an  action  A  that  satisfies  the  following  four  properties,  which 
respectively  correspond  to  Safety  Monotonicity,  Termination  Equivalence,  the 
Forwards  Frame  Property,  and  the  Backwards  Frame  Property  from  Section  2: 

1. )  ^crgAbad  A  cro#®"!  •  (Ti)Abad 

2. )  ^crgAbad  A  cro#®"!  (croAdiv  (ctq  •  cri)Adiv) 

3. )  CroAo-g  A  (To#(Tl  Cro#CTi  A  (cto  •  CTi)A((To  •  CTi) 

4. )  ^CTgAbad  A  (erg  •  CTi)Ai7'  dug  .  cr'  =  CTq  •  CTi  A  crgAcrg 

We  denote  the  set  of  all  local  actions  by  LocAct.  We  now  show  that  the  set 
of  local  actions  is  closed  under  composition  and  (possibly  infinite)  union.  We  use 
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c 


::=c|  C'i;C'2  \  Ci  +  C2  \  C* 

Vc  .  |c]]  €  LocAct 
ICi  +  C2I  =  [Cil  U  IC72I 

ICf  ^  Id 


IC7i;C2l  =  IC7iI;IC2l 

Id  i  IJ  d" 
icr+^^d;icr 


Fig.  2.  Command  Definition  and  Denotational  Semantics 


the  notation  Ai ;  A2  to  denote  composition,  and  IJ  ^  to  denote  union  (where  A 
is  a  possibly  infinite  set  of  actions).  The  formal  definitions  of  these  operations 
follow.  Note  that  we  require  that  A  be  non-empty.  This  is  necessary  because 
y  0  is  0,  which  is  not  a  valid  action.  Unless  otherwise  stated,  whenever  we  write 
[JA,  there  will  always  be  an  implicit  assumption  that  A^^. 

tAi]A2t'  3t"  .  tAit"  a  t" A2T' 

T  At'  3A  G  A  ■  tAt'  {A  7^  0) 

Lemma  1.  If  Ai  and  A2  are  local  actions,  then  Ai;  A2  is  a  local  action. 

Lemma  2.  If  every  A  in  the  set  A  is  a  local  action,  then  [JA  is  a  local  action. 

Figure  2  defines  our  abstract  program  syntax  and  semantics.  The  language 
consists  of  primitive  commands,  sequencing  (Ci;C'2),  nondeterministic  choice 
(Cl  +  C2),  and  finite  iteration  (C*).  The  semantics  of  primitive  commands  are 
abstracted  —  the  only  requirement  is  that  they  are  local  actions.  Therefore,  from 
the  two  previous  lemmas  and  the  trivial  fact  that  Id  is  a  local  action,  it  is  clear 
that  the  semantics  of  every  program  is  a  local  action. 

Note  that  in  our  concrete  language  used  if  statements  and  while  loops.  As 
shownin  [4],  it  is  possible  to  represent  if  and  while  constructs  with  finite  itera¬ 
tion  and  nondeterministic  choice  by  including  a  primitive  command  assume(i3), 
which  does  nothing  if  the  boolean  expression  B  is  true,  and  diverges  otherwise. 

Now  that  we  have  defined  the  interpretation  of  programs  as  local  actions,  we 
can  talk  about  the  meaning  of  a  triple  {P}  C  {Q}.  We  define  an  assertion  P  to 
be  a  set  of  states,  and  we  say  that  a  state  a  satisfies  P  iff  ct  G  P.  We  can  then 
define  the  separating  conjunction  as  follows: 

P*Q={<JGS  \  3(70  G  P,  CTl  G  Q  .  (T  =  (To  •  (Tl} 

Given  an  assignment  of  primitive  commands  to  local  actions,  we  say  that  a 
triple  is  valid,  written  \=  {P}  C  {Q},  just  when  the  following  two  properties  hold 
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rjcjbad 


(PRIM) 


h{W}c{{a'|aHa'}} 

h  {P}  Cl  {Q}  h  {P}  C2  {Q} 


^  {J"}  Cl  {Q}  h  {Q}  C2  {P} 
h  {P}Ci;C2{P} 

h{P}C{P} 


(SEQ) 


h  {P}  Cl  +  C2  {Q} 
h{P}C{Q} 


(PLUS) 


h  {P*P}C{Q*P} 
Vie/.  h{Pi}C{Qi} 


(FRAME) 


(DISJ) 


h{P}C*{P} 

p'cp  h{p}c{g}  gcQ' 
h{P'}C{Q'} 

Vie/.  h{Pi}c{Qi}  //0 


(STAR) 


(CONSEQ) 


(CONJ) 


Fig.  3.  Inference  Rnles 


for  all  states  a  and  a': 


1. )  (T  €  P  ^(TlCIbad 

2. )  (T  €  P  A  (t|C](t'  a'  £  Q 

The  inference  rules  of  the  logic  are  given  in  Figure  3.  Note  that  we  are  tak¬ 
ing  a  signihcant  presentation  shortcut  here  in  the  inference  rule  for  primitive 
commands.  Specifically,  we  assume  that  we  know  the  exact  local  action  |c]  of 
each  primitive  command  c.  This  assumption  makes  sense  when  we  define  our 
own  primitive  commands,  as  we  do  in  the  logic  of  Section  3.  However,  in  a  more 
general  setting,  we  might  be  provided  with  an  opaque  function  along  with  a  spec¬ 
ification  (precondition  and  postcondition)  for  the  function.  Since  the  function  is 
opaque,  we  must  consider  it  to  be  a  primitive  command  in  the  abstract  setting. 
Yet  we  do  not  know  how  it  is  implemented,  so  we  do  not  know  its  precise  local 
action.  In  [4],  the  authors  provide  a  method  for  inferring  a  “best”  local  action 
from  the  function’s  specification.  With  a  decent  amount  of  technical  develop¬ 
ment,  we  can  do  something  similar  here,  using  our  stronger  dehnition  of  locality. 
These  details  can  be  found  in  the  technical  report  [5] . 

Given  this  assumption,  we  prove  soundness  and  completeness  of  our  abstract 
logic.  The  details  of  the  proof  can  be  found  in  our  Coq  implementation  [5]. 

Theorem  1  (Soundness  and  Completeness). 

h{P}C{Q}  ^^{P}C{Q} 

5  Simplifying  Separation  Logic  Metatheory 

Now  that  we  have  an  abstracted  formalism  of  our  behavior-preserving  local  ac¬ 
tions,  we  will  resolve  each  of  the  four  metatheoretical  issues  described  in  Sec  I. 
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5.1  Footprints  and  Smallest  Safe  States 

Consider  a  situation  in  which  we  are  handed  a  program  C  along  with  a  specifi¬ 
cation  of  what  this  program  does.  The  specification  consists  of  a  set  of  axioms; 
each  axiom  has  the  form  {P}  C  {Q}  for  some  precondition  P  and  postcondition 
Q.  A  common  question  to  ask  would  be:  is  this  specification  complete!  In  other 
words,  if  the  triple  |=  {P}C{Q}  is  valid  for  some  P  and  Q,  then  is  it  possible 
to  derive  h  {P}  C  {Q}  from  the  provided  specification? 

In  standard  Separation  Logic,  it  can  be  extremely  difficult  to  answer  this 
question.  In  [12],  the  authors  conduct  an  in-depth  study  of  various  conditions 
and  circumstances  under  which  it  is  possible  to  prove  that  certain  specifications 
are  complete.  However,  in  the  general  case,  there  is  no  easy  way  to  prove  this. 

We  can  show  that  under  our  assumption  of  behavior  preservation,  there  is 
a  very  easy  way  to  guarantee  that  a  specification  is  complete.  In  particular,  a 
specification  that  describes  the  exact  behavior  of  C  on  all  of  its  smallest  safe 
states  is  always  complete.  Formally,  a  smallest  safe  state  is  a  state  a  such  that 
^(T|C]bad  and,  for  all  cr'  ^  cr,  a'lCIbad. 

To  see  that  such  a  specification  may  not  be  complete  in  standard  Separation 
Logic,  we  borrow  an  example  from  [12].  Consider  the  program  C,  defined  as 
X  :=  cons(O);  free(a:).  This  program  simply  allocates  a  single  cell  and  then  frees 
it.  Under  the  standard  model,  the  smallest  safe  states  are  those  of  the  form  (s,  0) 
for  any  store  s.  For  simplicity,  assume  that  the  only  variables  in  the  store  are 
X  and  y.  Define  the  specification  to  be  the  infinite  set  of  triples  that  have  the 
following  form,  for  any  a,  6  in  Z,  and  any  a'  in  N: 

{x  =  aAy  =  bA  emp}  C{x  =  a'Ay  =  bA  emp} 

Note  that  a'  must  be  in  N  because  only  valid  unallocated  memory  addresses  can 
be  assigned  into  x.  It  should  be  clear  that  this  specification  describes  the  exact 
behavior  on  all  smallest  safe  states  of  C.  Now  we  claim  that  the  following  triple 
is  valid,  but  there  is  no  way  to  derive  it  from  the  specification. 

{x  =  aAy  =  bAyi-A—}C{x  =  a'Ay  =  bAyi-A—Aa'f^b} 

The  triple  is  clearly  valid  because  a'  must  be  a  memory  address  that  was  initially 
unallocated,  while  address  b  was  initially  allocated.  Nevertheless,  there  will  not 
be  any  way  to  derive  this  triple,  even  if  we  come  up  with  new  assertion  syntax 
or  inference  rules.  The  behavior  of  C  on  the  larger  state  is  different  from  the 
behavior  on  the  small  one,  but  there  is  no  way  to  recover  this  fact  once  we  make 
C  opaque.  It  can  be  shown  (see  [12])  that  if  we  add  triples  of  the  above  form  to 
our  specification,  then  we  will  obtain  a  complete  specification  for  C.  Yet  there 
is  no  straightforward  way  to  see  that  such  a  specification  is  complete. 

We  will  now  formally  prove  that,  in  our  system,  there  is  a  canonical  form 
for  complete  specification.  We  first  note  that  we  will  need  to  assume  that  our 
set  of  states  is  well-founded  with  respect  to  the  substate  relation  (i.e.,  there 
is  no  infinite  strictly-decreasing  chain  of  states).  This  assumption  is  true  for 
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most  standard  models  of  Separation  Logic,  and  furthermore,  there  is  no  reason 
to  intuitively  believe  that  the  smallest  safe  states  should  be  able  to  provide  a 
complete  specification  when  the  assumption  is  not  true. 

We  say  that  a  specification  9  is  complete  for  C  if,  whenever  \=  {P}  C  {Q}  is 
valid,  the  triple  h  {P}  C  {Q}  is  derivable  using  only  the  inference  rules  that  are 
not  specific  to  the  structure  of  C  (i.e.,  the  frame,  consequence,  disjunction,  and 
conjunction  rules),  plus  the  following  axiom  rule: 

{P}C{Q}e'F 

h{P}C{Q} 

For  any  cr,  let  (j\C\  denote  the  set  of  all  a'  such  that  crlCJcr'.  For  any  set  of 
states  S',  we  define  a  canonical  specification  on  S  as  the  set  of  triples  of  the  form 
{{cr}}  C  {ctIC]]}  for  any  state  cr  e  S.  If  there  exists  a  canonical  specification  on 
S  that  is  complete  for  C,  then  we  say  that  S  forms  a  footprint  for  C.  We  can 
then  prove  the  following  theorem  (see  the  extended  TR): 

Theorem  2.  For  any  program  C,  the  set  of  all  smallest  safe  states  of  C  forms 
a  footprint  for  C. 

Note  that  while  this  theorem  guarantees  that  the  canonical  specification  is 
complete,  we  may  not  actually  be  able  to  write  down  the  specification  simply 
because  the  assertion  language  is  not  expressive  enough.  This  would  be  the  case 
for  the  behavior-preserving  nondeterministic  memory  allocator  if  we  used  the 
assertion  language  presented  in  Section  3.  We  could,  however,  express  canonical 
specifications  in  that  system  by  extending  the  assertion  language  to  talk  about 
the  free  list  (as  briefly  discussed  at  the  end  of  Section  3). 


5.2  Data  Refinement 

In  [6] ,  the  goal  is  to  formalize  the  concept  of  having  a  concrete  module  correctly 
implement  an  abstract  one,  within  the  context  of  Separation  Logic.  Specifically, 
the  authors  prove  that  as  long  as  a  client  program  “behaves  nicely,”  any  execu¬ 
tion  of  the  program  using  the  concrete  module  can  be  tracked  to  a  corresponding 
execution  using  the  abstract  module.  The  client  states  in  the  corresponding  ex¬ 
ecutions  are  identical,  so  the  proof  shows  that  a  well-behaved  client  cannot  tell 
the  difference  between  the  concrete  and  abstract  modules. 

To  get  their  proof  to  work  out,  the  authors  require  two  somewhat  odd  proper¬ 
ties  to  hold.  The  first  is  called  contents  independence,  and  is  an  extra  condition 
on  top  of  the  standard  locality  conditions.  The  second  is  called  a  growing  rela¬ 
tion  —  it  refers  to  the  relation  connecting  a  state  of  the  abstract  module  to  its 
logically  equivalent  state(s)  in  the  concrete  module.  All  relations  connecting  the 
abstract  and  concrete  modules  in  this  way  are  required  to  be  growing,  which 
means  that  the  domain  of  memory  used  by  the  abstract  state  must  be  a  subset 
of  that  used  by  the  concrete  state.  This  is  a  somewhat  unintuitive  and  restric¬ 
tive  requirement  which  is  needed  for  purely  technical  reasons.  We  will  show  that 
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behavior  preservation  completely  eliminates  the  need  for  both  contents  indepen¬ 
dence  and  growing  relations. 

We  now  provide  a  formal  setting  for  the  data  refinement  theory.  This  formal 
setting  is  similar  to  the  one  in  [6],  but  we  will  make  some  minor  alterations  to 
simplify  the  presentation.  The  programming  language  is  defined  as: 

C  ::=  skip  |  c  |  m  |  Ci;  (72  |  if  S  then(7i  else  C2 
I  while  B  doC 

c  is  a  primitive  command  (sometimes  referred  to  as  “client  operation”  in  this 
context),  m  is  a  module  command  taken  from  an  abstracted  set  MOp  (e.g.,  a 
memory  manager  might  implement  the  two  module  commands  cons  and  free). 

The  abstracted  client  and  module  commands  are  assumed  to  have  a  seman¬ 
tics  mapping  them  to  particular  local  actions.  We  of  course  use  our  behavior¬ 
preserving  notion  of  “local”  here,  whereas  in  [6],  the  authors  use  the  three  proper¬ 
ties  of  safety  monotonicity,  the  (backwards)  frame  property,  and  a  new  property 
called  contents  independence.  It  is  trivial  to  show  that  behavior  preservation  im¬ 
plies  contents  independence,  as  contents  independence  is  essentially  a  forwards 
frame  property  that  can  only  be  applied  under  special  circumstances. 

A  module  is  a  pair  (p,  77)  representing  a  particular  implementation  of  the  mod¬ 
ule  commands  in  MOp;  the  state  predicate  p  describes  the  module’s  invariant 
(e.g.,  that  a  valid  free  list  is  stored  starting  at  a  location  pointed  to  by  a  par¬ 
ticular  head  pointer),  while  77  is  a  function  mapping  each  module  command  to 
a  particular  local  action.  The  predicate  p  is  required  to  be  precise  [11],  meaning 
that  no  state  can  have  more  than  one  substate  satisfying  p  (if  a  state  a  does 
have  a  substate  satisfying  p,  then  we  refer  to  that  uniquely-defined  state  as  CTp). 
Additionally,  all  module  operations  are  required  to  preserve  the  invariant  p: 

^(T(77m)bad  A  cr  G  p  *  true  A  a{rim)a'  a'  &  p  *  true 

We  define  a  big-step  operational  semantics  parameterized  by  a  module  (p,  77). 
This  semantics  is  fundamentally  the  same  as  the  one  defined  in  [6] ;  the  extended 
TR  contains  the  full  details.  The  only  aspect  that  is  important  to  mention  here 
is  that  the  semantics  is  equipped  with  a  special  kind  of  faulting  called  “access 
violation.”  Intuitively,  an  access  violation  occurs  when  a  client  operation’s  ex¬ 
ecution  depends  on  the  module’s  portion  of  memory.  More  formally,  it  occurs 
when  the  client  operation  executes  safely  on  a  state  where  the  module’s  mem¬ 
ory  is  present  (i.e.,  a  state  satisfying  p  *  true),  but  faults  when  that  memory  is 
removed  from  the  state. 

The  main  theorem  that  we  get  out  of  this  setup  is  a  refinement  simulation 
between  a  program  being  run  in  the  presence  of  an  abstract  module  (p,  77),  and 
the  same  program  being  run  in  the  presence  of  a  concrete  module  (q,  p)  that 
implements  the  same  module  commands  (i.e.,  [77J  =  [pj ,  where  the  floor  notation 
indicates  domain).  Suppose  we  have  a  binary  relation  R  relating  states  of  the 
abstract  module  to  those  of  the  concrete  module.  For  example,  if  our  modules 
are  memory  managers,  then  R  might  relate  a  particular  set  of  memory  locations 
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available  for  allocation  to  all  lists  containing  that  set  of  locations  in  some  order. 
To  represent  that  R  relates  abstract  module  states  to  concrete  module  states,  we 
require  that  whenever  aiRa2,  Ui  &  p  and  CT2  G  q.  Given  this  relation  R,  we  can 
make  use  of  the  separating  conjunction  of  Relational  Separation  Logic  [14]  and 
write  i?  *  Id  to  indicate  the  relation  relating  any  two  states  of  the  form  Cp  •  CTc 
and  Uq  ■  Uci  where  apRaq. 

Now,  for  any  module  (p,  rj),  let  C[{p,  rj)]  be  notation  for  the  program  C  whose 
semantics  have  (p,  p)  filled  in  for  the  parameter  module.  Then  our  main  theorem 
says  that,  if  p(/)  simulates  p(/)  under  relation  i?  *  Id  for  all  /  G  [pj,  then  for 
any  program  C,  C[{p,  p)]  also  simulates  C[{q,  p)]  under  relation  R  *  Id.  More 
formally,  say  that  Ci  simulates  C2  under  relation  R  (written  i?;  C2  C  Ci ;  R) 
when,  for  all  cti,  (72  such  that  aiRa2' 

1. )  (JilCilbad  (T2|C'2]bad,  and 

2. )  ^(Ti|Ci]bad  (V(J2  .  (72|C'2](72  ^  dcrj  .  (TilCilcr'i  A  crjRcr^) 


Theorem  3.  Suppose  we  have  modules  (p,  p)  and  {q,  p)  with  [pj  =  [pj  and  a 
refinement  relation  R  as  described  above,  such  that  R*  Id;p(/)  C  ri{f);R*  Id 
for  all  /  G  [pJ .  Then,  for  any  program  C ,  R*  Id;  C[(q,  p)]  C  C[{p,  p)];  i?  *  Id. 

While  the  full  proof  can  be  found  in  the  extended  TR,  we  will  semi-formally 
describe  here  the  one  case  that  highlights  why  behavior  preservation  eliminates 
the  need  for  contents  independence  and  growing  relations:  when  C  is  simply  a 
client  command  c.  We  wish  to  prove  that  C[{p,  p)]  simulates  C[{q,  p)],  so  suppose 
we  have  related  states  cti  and  U2,  and  executing  c  on  (J2  results  in  Since  ai 
and  (72  are  related  by  i?  *  Id,  we  have  that  ui  =  cfp  •  Cfc  and  02  =  <Jq  •  (Jc-  We 
know  that  (1)  aq  -  ac  (2)  c  is  local,  and  (3)  c  runs  safely  on  Uc  because  the 

client  operation’s  execution  must  be  independent  of  the  module  state  Uq,  thus 
the  backwards  frame  property  tells  us  that  cr^  =  cr^  •  cr'  and  Uc  A  cr'.  Now,  if  c 
is  behavior-preserving,  then  we  can  simply  apply  the  forwards  frame  property, 
framing  on  the  state  (7p,  to  get  that  Upffa'^  and  Gp  •  Oc  ^  Gp  •  cr',  completing 
the  proof  for  this  case.  However,  without  behavior  preservation,  contents  inde¬ 
pendence  and  growing  relations  are  used  in  [6]  to  finish  the  proof.  Specifically, 
because  we  know  that  Gq  •  Gc  Gq  •  ct'  and  that  c  runs  safely  on  Gc,  contents 
independence  says  that  g  ■  Gc  g  ■  g'^  for  any  g  whose  domain  is  a  subset  of  the 
domain  of  Gq.  Therefore,  we  can  choose  g  =  Gp  because  i?  is  a  growing  relation. 

5.3  Relational  Separation  Logic 

Relational  Separation  Logic  [14]  allows  for  simple  reasoning  about  the  relation¬ 
ship  between  two  executions.  Instead  of  deriving  triples  {P}  C  {Q},  a  user  of  the 
logic  derives  quadruples  of  the  form: 
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R  and  S  are  binary  relations  on  states,  rather  than  unary  predicates.  Semanti¬ 
cally,  a  quadruple  says  that  if  we  execute  the  two  programs  in  states  that  are 
related  by  i?,  then  both  executions  are  safe,  and  any  termination  states  will  be 
related  by  S.  Furthermore,  we  want  to  be  able  to  use  this  logic  to  prove  program 
equivalence,  so  we  also  require  that  initial  states  related  by  R  have  the  same 
divergence  behavior.  Formally,  we  say  that  the  above  quadruple  is  valid  if,  for 
any  states  cti,  (J2,  (j'i,  tr^: 

1. )  CTii?(T2  ^(Ti|C]bad  A  ^(T2|C'']bad 

2. )  CTii?(T2  ((Ji|C']div  (j2|C"]div) 

3. )  CTii?(T2  A  (Ti|C']ctJ  a  CT2|C''](72  crJS'cT^ 

Relational  Separation  Logic  extends  the  separating  conjunction  to  work  for 
relations,  breaking  related  states  into  disjoint,  correspondingly-related  pieces: 

*  S')(T2  3  CTir,  (Jls,  0'2r,  Cr2s- 

CTi  =  (Tir  •  CTls  A  (T2  =  (J2r  '  ^2s  A  0'irRo'2r  A  0'isS(T2s 


Just  as  Separation  Logic  has  a  frame  rule  for  enabling  local  reasoning.  Rela¬ 
tional  Separation  Logic  has  a  frame  rule  with  the  same  purpose.  This  frame  rule 
says  that,  given  that  we  can  derive  the  quadruple  above  involving  R,  S,  C,  and 
C' ,  we  can  also  derive  the  following  quadruple  for  any  relation  T: 

{R*T}^^{S*T} 

In  [14],  it  is  shown  that  the  frame  rule  is  sound  when  all  programs  are  determin¬ 
istic  but  it  is  unsound  if  nondeterministic  programs  are  allowed,  so  this  frame 
rule  cannot  be  used  when  we  have  a  nondeterministic  memory  allocator. 

To  deal  with  nondeterministic  programs,  a  solution  is  proposed  in  [14],  in 
which  the  interpretation  of  quadruples  is  strengthened.  The  new  interpretation 
for  a  quadruple  containing  i?,  S,  C,  and  C  is  that,  for  any  cri,  CT2,  ct,  ct': 

1. )  aiRa2  =>  ^CTilCJbad  A  ^cr2[C'lbad 

2. )  aiRa2  A  <Ti#a  A  (J2#cr'  ((cti  •  crjlCJdiv  (fT2  •  cr')|C"]div) 

3. )  aiRa2  A  crilCJcrj  A  cr2|C'](T2  crjS'cr^ 

Note  that  this  interpretation  is  the  same  as  before,  except  that  the  second  prop¬ 
erty  is  strengthened  to  say  that  divergence  behavior  must  be  equivalent  not  only 
on  the  initial  states,  but  also  on  any  larger  states.  It  can  be  shown  that  the  frame 
rule  becomes  sound  under  this  stronger  interpretation  of  quadruples. 

In  our  behavior-preserving  setting,  it  is  possible  to  use  the  simpler  interpre¬ 
tation  of  quadruples  without  breaking  soundness  of  the  frame  rule.  We  could 
prove  this  by  directly  proving  frame  rule  soundness,  but  instead  we  will  take  a 
shorter  route  in  which  we  show  that,  when  actions  are  behavior-preserving,  a 
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quadruple  is  valid  under  the  first  interpretation  above  if  and  only  if  it  is  valid 
under  the  second  interpretation  —  i.e.,  the  two  interpretations  are  the  same  in 
our  setting.  Since  the  frame  rule  is  sound  under  the  second  interpretation,  this 
implies  that  it  will  also  be  sound  under  the  first  interpretation. 

Clearly,  validity  under  the  second  interpretation  implies  validity  under  the 
first,  since  it  is  a  direct  strengthening.  To  prove  the  inverse,  suppose  we  have  a 
quadruple  (involving  R,  S,  C ,  and  C)  that  is  valid  under  the  first  interpretation. 
Properties  1  and  3  of  the  second  interpretation  are  identical  to  those  of  the  first, 
so  all  we  need  to  show  is  that  Property  2  holds.  Suppose  that  aiRa2,  ui^a,  and 
'72#cr'.  By  Property  1  of  the  first  interpretation,  we  know  that  ^(Ji|C']bad  and 
^(T2|C"]bad.  Therefore,  Termination  Equivalence  tells  us  that  CTilCJdiv 
(cti  •(T)|C']div,  and  that  CT2|C"]div  ((J2  •cr')|C"]div.  Furthermore,  we  know 

by  Property  2  of  the  first  interpretation  that  crilCldiv  (T2[C’ldiv.  Hence 

we  obtain  our  desired  result. 

In  case  the  reader  is  curious,  the  reason  that  the  frame  rule  under  the  first 
interpretation  is  sound  when  all  programs  are  deterministic  is  simply  that  deter¬ 
minism  (along  with  standard  locality)  implies  Termination  Equivalence.  A  proof 
of  this  can  be  found  in  the  extended  TR. 


5.4  Finite  Memory 

Since  standard  locality  allows  the  program  state  to  increase  during  execution, 
it  does  not  play  nicely  with  a  model  in  which  memory  is  finite.  Consider  any 
command  that  grows  the  program  state  in  some  way.  Such  a  command  is  safe  on 
the  empty  state  but,  if  we  extend  this  empty  state  to  the  larger  state  consisting  of 
all  available  memory,  then  the  command  becomes  unsafe.  Hence  such  a  command 
violates  Safety  Monotonicity. 

There  is  one  commonly-used  solution  for  supporting  finite  memory  without 
enforcing  behavior  preservation:  say  that,  instead  of  faulting  on  the  state  consist¬ 
ing  of  all  of  memory,  a  state-growing  command  diverges.  Furthermore,  to  satisfy 
Termination  Monotonicity,  we  also  need  to  allow  the  command  to  diverge  on 
any  state.  The  downside  of  this  solution,  therefore,  is  that  it  is  only  reasonable 
when  we  are  not  interested  in  the  termination  behavior  of  programs. 

When  behavior  preservation  is  enforced,  we  no  longer  have  any  issues  with 
finite  memory  models  because  program  state  cannot  increase  during  execution. 
The  initial  state  is  obviously  contained  within  the  finite  memory,  so  all  states 
reachable  through  execution  must  also  be  contained  within  memory. 


6  Related  Work  and  Conclusions 

The  definition  of  locality  (or  local  action),  which  enables  the  frame  rule,  plays 
a  critical  role  in  Separation  Logic  [8,13,15].  Almost  all  versions  of  Separation 
Logic  —  including  their  concurrent  [3, 10,4],  higher-order  [2],  and  relational  [14] 
variants,  as  well  as  mechanized  implementation  (e.g.,  [1])  —  have  always  used 
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the  same  locality  definition  that  matches  the  well-known  Safety  and  Termination 
Monotonicity  properties  and  the  Frame  Property  [15]. 

In  this  paper,  we  argued  a  case  for  strengthening  the  definition  of  locality 
to  enforce  behavior  preservation.  This  means  that  the  behavior  of  a  program 
when  executed  on  a  small  state  is  identical  to  the  behavior  when  executed  on  a 
larger  state  —  put  another  way,  excess,  unused  state  cannot  have  any  effect  on 
program  behavior.  We  showed  that  this  change  can  be  made  to  have  no  effect  on 
the  usage  of  Separation  Logic,  and  we  gave  multiple  examples  of  how  it  simplifies 
reasoning  about  metatheoretical  properties. 

Determinism  Constancy  One  related  work  that  calls  for  comparison  is  the  prop¬ 
erty  of  “Determinism  Constancy”  presented  by  Raza  and  Gardner  [12],  which  is 
also  a  strengthening  of  locality.  While  they  use  a  slightly  different  notion  of  ac¬ 
tion  than  we  do,  it  can  be  shown  that  Determinism  Constancy,  when  translated 
into  our  context  (and  ignoring  divergence  behaviors),  is  logically  equivalent  to: 

O'oIC'lo’o  A  cro#cri  ^  cro#0-i  A  (oo  •  o-i)|C'](cto  •  CTi) 

For  comparison,  we  repeat  our  Forwards  Frame  Property  here: 

f^oIClo-Q  A  cro#cri  ^  cro#cri  A  (oq  •  0-i)|C'](cto  •  (Ti) 

While  our  strengthening  of  locality  prevents  programs  from  increasing  state  dur¬ 
ing  execution.  Determinism  Constancy  prevents  programs  from  decreasing  state. 
The  authors  use  Determinism  Constancy  to  prove  the  same  property  regarding 
footprints  that  we  proved  in  Section  5.1.  Note  that,  while  behavior  preservation 
does  not  imply  Determinism  Constancy,  our  concrete  logic  of  Section  3  does  have 
the  property  since  it  never  decreases  state  (we  chose  to  have  the  free  command 
put  the  deallocated  cell  back  onto  the  free  list,  rather  than  get  rid  of  it  entirely). 

While  Determinism  Constancy  is  strong  enough  to  prove  the  footprint  prop¬ 
erty,  it  does  not  provide  behavior  preservation  —  an  execution  on  a  small  state 
can  still  become  invalid  on  a  larger  state.  Thus  it  will  not,  for  example,  help  in 
resolving  the  dilemma  of  growing  relations  in  the  data  refinement  theory.  Due 
to  the  lack  of  behavior  preservation,  we  do  not  expect  the  property  to  have  a 
significant  impact  on  the  metatheory  as  a  whole.  Note,  however,  that  there  does 
not  seem  to  be  any  harm  in  using  both  behavior  preservation  and  Determin¬ 
ism  Constancy.  The  two  properties  together  enforce  that  the  area  of  memory 
accessible  to  a  program  be  constant  throughout  execution. 

Module  Reasoning  Besides  our  discussion  of  data  refinement  in  Section  5.2,  there 
has  been  some  previous  work  on  reasoning  about  modules  and  their  implementa¬ 
tions.  In  [11],  a  “Hypothetical  Frame  Rule”  is  used  to  allow  modular  reasoning 
when  a  module’s  implementation  is  hidden  from  the  rest  of  the  code.  In  [2], 
a  higher-order  frame  rule  is  used  to  allow  reasoning  in  a  higher-order  language 
with  hidden  module  or  function  code.  However,  neither  of  these  works  discuss  re¬ 
lational  reasoning  between  different  modules.  We  are  not  aware  of  any  relational 
logic  for  reasoning  about  modules. 
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Abstract.  Thread  management  is  an  essential  functionality  in  OS  kernels.  How¬ 
ever,  verification  of  thread  management  remains  a  challenge,  due  to  two  conflict¬ 
ing  requirements:  on  the  one  hand,  a  thread  manager — operating  below  the  thread 
abstraction  layer-should  hide  its  implementation  details  and  be  verified  indepen¬ 
dently  from  the  threads  being  managed;  on  the  other  hand,  the  thread  management 
code  in  many  real-world  systems  is  concurrent,  which  might  be  executed  by  the 
threads  being  managed,  so  it  seems  inappropriate  to  abstract  threads  away  in  the 
verification  of  thread  managers.  Previous  approaches  on  kernel  verification  view 
thread  managers  as  sequential  code,  thus  cannot  be  applied  to  thread  manage¬ 
ment  in  realistic  kernels.  In  this  paper,  we  propose  a  novel  two-layer  framework 
to  verify  concurrent  thread  management.  We  choose  a  lower  abstraction  level 
than  the  previous  approaches,  where  we  abstract  away  the  context  switch  routine 
only,  and  allow  the  rest  of  the  thread  management  code  to  run  concurrently  in  the 
upper  level.  We  also  treat  thread  management  data  as  abstract  resources  so  that 
threads  in  the  environment  can  be  specified  in  assertions  and  be  reasoned  about 
in  a  proof  system  similar  to  concurrent  separation  logic. 


1  Introduction 

Thread  scheduling  in  modern  operating  systems  provides  the  functionality  of  virtualiz¬ 
ing  processors:  when  a  thread  is  waiting  for  an  event,  it  gives  the  control  of  the  processor 
to  another  thread  to  create  the  illusion  that  each  thread  has  its  own  processor. 

Inside  a  kernel,  a  thread  manager  supervises  all  threads  in  the  system  by  manip¬ 
ulating  data  structures  called  thread  control  blocks  (TCBs).  A  TCB  is  used  to  record 
important  information  about  a  thread,  such  as  the  machine  context  (or  processor  state), 
the  thread  identifier,  the  status  description,  the  location  and  size  of  the  stack,  the  prior¬ 
ity  for  scheduling,  and  the  entry  point  of  thread  code.  The  TCBs  are  often  implemented 
using  data  structures  such  as  queues  for  ready  and  waiting  threads.  Clearly,  modifying 
thread  queues  and  TCBs  would  drastically  change  the  behaviors  of  threads.  Therefore, 
a  correct  implementation  of  thread  management  is  crucial  for  guaranteeing  the  whole 
system  safety.  Unfortunately,  modular  verification  of  real-world  thread  management 
code  remains  a  big  challenge  today. 

The  challenge  comes  from  two  apparently  conflicting  goals  which  we  want  to  achieve 
at  the  same  time:  abstraction  (for  modular  verification)  and  efficiency  (for  real-world 
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usability).  On  the  one  hand,  TCBs,  thread  queues,  and  the  thread  scheduler  are  specifics 
used  to  implement  threads  so  they  should  sit  at  a  lower  abstraction  layer.  It  is  natural  to 
abstract  them  away  from  threads,  and  to  verify  threads  and  the  thread  scheduler  sepa¬ 
rately  at  different  abstraction  layers.  Previous  work  has  shown  it  is  extremely  difficult 
to  verify  them  together  in  one  logic  system  M-  On  the  other  hand,  in  many  real-world 
systems  such  as  Linux-2.6.10  Iia  and  FreeBSD-5.2  US,  the  thread  scheduler  code 
itself  is  also  concurrent  in  the  sense  that  there  may  be  multiple  threads  in  the  system 
running  the  scheduler  at  the  same  time.  For  instance,  when  a  thread  invokes  a  thread 
scheduler  routine  {e.g.,  cleaning  up  dead  threads,  load  balancing,  or  thread  scheduling) 
and  traverses  the  thread  queue,  it  may  be  preempted  by  other  threads  who  may  call 
the  same  routine  and  traverse  the  queue  too.  Also,  in  some  systems  1121111  the  thread 
scheduling  itself  is  implemented  as  a  separate  thread  that  runs  concurrently  with  other 
threads.  In  these  cases,  we  need  to  verify  thread  schedulers  in  a  “multi-threaded”  logic, 
taking  threads  into  account  instead  of  abstracting  them  away. 

Earlier  work  on  thread  scheduling  verification  fails  to  achieve  the  two  goals  at  the 
same  time.  Ni  etal.  CSl  verified  both  the  thread  switch  and  the  threads  in  one  logic  m, 
which  treats  thread  return  addresses  as  first-class  code  pointers.  Although  their  method 
may  support  concurrent  thread  schedulers  in  real  systems,  it  loses  the  abstraction  of 
threads  completely,  and  makes  the  logic  and  specifications  too  complex  for  practical 
use.  Recent  work  I3l6l  adopts  two-layer  verification  frameworks  to  verify  concurrent 
kernels.  Kernel  code  is  divided  into  two  layers:  sequential  code  in  the  lower  layer  and 
concurrent  in  the  upper  layer.  In  their  frameworks,  they  put  the  code  manipulating  TCBs 
{e.g.,  thread  schedulers)  in  the  low  layer,  and  hide  the  TCBs  of  threads  in  the  upper  layer 
so  that  the  threads  cannot  modify  them.  Then  they  use  sequential  program  logics  to 
verify  thread  management  code.  However,  this  approach  is  not  usable  for  many  realistic 
kernels  where  thread  managers  themselves  are  concurrent  and  the  threads  are  allowed 
to  modify  the  TCBs.  Other  work  on  OS  verification  111  1 1911  only  supports  non-reentrant 
kernels,  i.e.,  there  is  only  one  thread  running  in  the  kernel  at  any  time. 

In  this  paper,  we  propose  a  more  natural  framework  to  verify  concurrent  thread  man¬ 
agers.  Our  framework  follows  the  two-layer  approach,  so  concurrent  code  at  the  upper 
layer  can  be  verified  modularly  with  thread  abstractions.  However,  the  abstraction  level 
of  our  framework  is  much  lower  than  previous  frameworks  da.  The  majority  of  the 
code  manipulating  thread  queues  and  TCBs  is  put  in  the  upper  layer  and  can  be  veri¬ 
fied  as  concurrent  code.  Our  framework  successfully  achieves  both  verification  goals:  it 
not  only  allows  abstraction  and  modular  verification,  but  also  supports  concurrency  in 
real-world  thread  management. 

Our  work  is  based  on  previous  work  on  thread  scheduler  verification,  but  makes  the 
following  new  contributions: 

-  We  introduce  a  fine-grained  abstraction  in  our  two-layer  verification  framework. 
The  abstraction  protects  only  a  small  part  of  sensitive  data  in  TCBs,  and  at  the  same 
time  allows  multiple  threads  to  modify  other  part  of  TCBs  safely.  Our  division  of 
the  two  abstraction  layers  is  consistent  with  many  real  systems.  It  is  more  natural 
and  can  support  more  realistic  thread  managers  than  previous  work. 

-  In  the  upper  layer,  we  introduce  the  idea  of  treating  threads  as  resources.  The  ab¬ 
stract  thread  resources  can  be  specified  explicitly  in  the  assertion  language,  and 
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Fig.  1.  Three  patterns  of  scheduling 


their  use  by  concurrent  programs  can  be  reasoned  about  modularly  following  con¬ 
current  separation  logic  (CSL)  M-  By  enforcing  the  invariant  that  the  abstract 
resource  is  consistent  with  the  concrete  thread  meta  data,  we  can  ensure  the  safety 
of  the  accesses  over  TCBs  and  thread  queues  inside  threads. 

-  Because  of  the  fine-grained  abstraction  of  our  approach,  the  semantics  of  thread 
scheduling  do  not  have  to  be  hardwired  in  the  logic.  Therefore,  our  framework 
can  be  used  to  verify  various  implementation  patterns  of  thread  management.  We 
show  how  to  verify  the  three  common  patterns  of  thread  scheduling  in  realistic  OS 
kernels  (while  previous  two-layer  frameworks  da  can  only  verify  one  of  them). 

-  In  our  extended  TR  121,  we  also  use  our  framework  to  verify  thread  schedulers  with 
hardware  interrupts,  scheduling  over  multiprocessor  with  load-balancing,  and  a  set 
of  other  thread  management  routines  such  as  thread  creation,  join  and  termination. 

The  rest  of  this  paper  is  organized  as  follows:  we  first  introduce  a  simplified  abstract 
machine  model  for  the  higher-layer  of  our  framework  in  Sec.  [2  to  show  our  main  idea, 
we  propose  in  Sec.  |4] our  proof  system  for  concurrent  thread  scheduling  code  over  the 
abstract  machine.  We  show  how  to  verify  two  prototypes  of  schedulers  based  on  context 
switch  in  Sec.|5]  We  compare  with  related  work  in  Sec.|6l  and  then  conclude. 

2  Challenges  and  our  approach 

In  this  section,  we  illustrate  the  challenges  of  verifying  code  of  thread  scheduling  by 
showing  three  patterns  of  schedulers  and  discuss  the  verification  issues.  Then  we  infor¬ 
mally  explain  the  basic  ideas  of  our  approach. 

2.1  Three  patterns  of  thread  scheduling 

By  deciding  which  thread  to  run  next,  the  thread  scheduler  is  responsible  for  best  uti¬ 
lizing  the  system  and  makes  multiple  threads  run  concurrently.  The  scheduling  process 
consists  of  the  following  steps:  selecting  which  thread  to  run  next  in  a  thread  queue  by 
modifying  TCBs,  saving  the  context  data  of  the  current  thread,  and  loading  the  con¬ 
text  data  of  the  next  thread.  Context  data  is  the  state  of  the  processor.  By  saving  and 
loading  context  data,  the  processor  can  run  in  multiple  control  flows,  i.e.,  threads.  Usu¬ 
ally,  context  data  can  be  saved  on  stacks  or  TCBs  (we  assume  in  this  paper  that  context 
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data  is  saved  in  TCBs  for  the  brevity  of  presentation).  There  are  various  ways  to  imple¬ 
ment  thread  schedulers.  In  Fig.  [T]  we  show  three  common  implementation  patterns,  all 
modeled  from  real  systems. 

Pattern  (I)  is  popular  among  embedded  OS  kernels  {e.g.,  FreeRTOS)  and  some 
micro-kernels  (e.g.,  Minix  |[8|  and  Exokernel  ||2l).  The  scheduler  in  this  pattern  is  in¬ 
voked  by  function  calls  or  interrupts.  Thereafter,  the  scheduling  is  done  in  the  following 
steps;  (1)  saving  the  current  context  data,  (2)  finding  the  next  thread,  and  (3)  loading  the 
context  data  of  the  next  thread  (and  switching  to  it  implicitly  through  function  return). 

In  pattern  (II),  the  scheduling  process  is  a  function  with  the  following  steps:  (1) 
finding  the  next  thread  firstly,  (2)  performing  context  switch  (saving  the  current  context 
data,  loading  the  next  one,  and  jumping  to  the  next  thread  immediately),  (3)  and  running 
the  remaining  code  of  the  function  when  the  control  is  switched  back  from  other  threads. 
This  pattern  is  modeled  from  some  mainstream  monolithic  kernels  (e.g.,  Linux  ifTSIl.  and 
FreeBSD).  Some  embedded  kernels  (e.g.,  RTEMS  and  uClinux)  adopt  it  too.  Note  that 
both  the  involved  threads  should  be  allowed  to  access  the  thread  queue  and  TCBs  when 
calling  the  scheduler. 

Pattern  (III)  uses  a  separate  thread,  called  scheduler  thread,  to  do  scheduling.  One 
thread  may  perform  scheduling  by  doing  context  switch  to  the  scheduler  thread.  The 
scheduler  thread  is  a  big  infinite  loop:  finding  the  next  thread;  performing  context  switch 
to  the  next  thread;  and  looping  after  return.  This  pattern  can  be  seen  in  the  GNU-pth 
thread  library,  MIT-xv6  kernel,  L4::Ka,  etc..  Similar  to  pattern  (II),  all  involved  threads 
in  this  pattern  should  be  allowed  to  access  the  TCB  of  the  scheduler  thread  and  the 
thread  queue. 

2.2  Challenges 

As  we  can  see  from  the  patterns  in  Pig.  [T]  the  control  flow  in  the  scheduling  process 
is  very  complicated.  Threads  switch  back  and  forth  via  manipulating  the  thread  queues 
and  TCBs.  It  is  very  natural  to  share  TCBs  and  the  thread  queue  among  threads  in  order 
to  support  all  these  scheduling  patterns.  On  the  other  hand,  it  is  important  to  ensure  that 
the  TCBs  are  accessed  in  the  right  way.  The  system  would  go  wrong  if,  for  instance,  a 
thread  erased  the  context  data  of  another  by  mistake,  or  put  a  dead  thread  back  into  the 
ready  thread  queue. 

To  guarantee  the  safety  of  the  scheduling  process,  we  must  fulfill  two  requirements; 
(1)  No  thread  can  incorrectly  modify  the  context  data  in  TCBs. 
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(2)  The  scheduler  should  know  the  status  of  each  thread  in  the  thread  queues  and  decide 

which  to  run  next. 

To  satisfy  the  requirement  (1),  some  previous  work  Eia  adopts  a  two-layer-based 
approach  and  protects  the  TCBs  through  abstraction,  where  the  TCBs  are  simply  hid¬ 
den  from  kernel  threads  and  become  inaccessible.  This  approach  can  be  used  to  verify 
schedulers  of  pattern  (I),  for  which  we  show  the  abstraction  line  in  Fig.|2(a).  Threads 
above  the  line  cannot  modify  TCBs,  while  the  scheduler  is  below  this  line  and  has  full 
access  to  them.  The  lower-layer  scheduler  provides  an  abstract  interface  to  the  verifi¬ 
cation  of  concurrent  thread  code  at  the  upper  layer.  Since  it  modifies  the  TCBs  in  the 
scheduling  time  only,  we  can  view  the  scheduler  as  a  sequential  function  which  does  not 
belong  to  any  thread  and  can  be  verified  by  a  conventional  Hoare-style  logic.  However, 
this  approach  cannot  verify  the  other  two  patterns,  nor  does  it  fulfill  the  requirement  (2) 
for  concurrent  schedulers,  where  the  TCBs  are  manipulated  concurrently  (not  sequen¬ 
tially  as  in  pattern  (I))  and  should  be  known  by  threads.  That  is,  we  cannot  completely 
hide  the  TCBs  from  the  upper-layer  concurrent  threads  for  patterns  (II)  and  (III). 

2.3  Our  approach 

If  we  inspect  the  TCB  data  carefully,  we  can  see  that  only  a  small  part  of  the  data  is 
crucial  to  thread  behaviors  and  cannot  be  accessed  concurrently.  It  is  unnecessary  to 
access  it  concurrently  either.  The  data  includes  the  machine  context  data  and  the  stack 
location.  We  call  them  safety-critical  values.  Some  values  can  be  modified  concurrently, 
but  their  correctness  is  still  important  to  the  safety  of  the  kernel,  e.g.,  the  pointers  orga¬ 
nizing  thread  queues  and  the  status  field  belong  to  this  kind  of  values.  Other  values  of 
TCBs  have  nothing  to  do  with  the  safety  of  the  kernel  and  can  be  modified  concurrently 
definitely,  e.g.,  the  name  of  a  thread  or  debug  information. 

Lowering  the  abstraction  level.  To  protect  the  safety  critical  part  of  TCBs,  we  lower 
the  abstraction  line,  as  shown  in  Fig.|2](b).  In  our  framework,  the  safety-critical  data  of 
TCBs  is  under  the  abstraction  line  and  hidden  from  threads.  The  corresponding  oper¬ 
ations  such  as  context  saving,  loading  and  switching  are  abstracted  away  from  threads 
too,  with  only  interfaces  exposed  to  the  upper  layer.  The  other  part  of  TCBs  are  lifted 
above  this  line,  which  can  be  accessed  by  concurrent  threads. 

Building  abstract  threads.  We  still  need  to  ensure  the  concurrent  accesses  of  non- 
safety-critical  TCB  data  are  correct.  For  instance,  we  cannot  allow  a  dead  thread  to 
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be  put  onto  a  ready  thread  queue.  To  address  this  issue,  we  build  abstract  threads  to 
carry  information  of  threads  from  TCBs  to  guide  modifications  by  each  other.  In  Fig.  [3 
we  use  the  notation  [t]  to  specify  the  running  thread,  and  the  notation  (t),  for  a  ready 
thread.  Here  t  is  the  identifier  of  the  thread.  With  the  knowledge  about  the  existence  of 
a  ready  thread  B  pointed  by  next  (i.e.,  (6)),  we  know  it  is  safe  to  switch  to  it  via  the 
operation  cswitch(A,next) .  Since  abstract  threads  can  be  described  in  specifications, 
it  allows  us  to  write  more  intuitive  and  readable  specifications  for  kernel  code. 

Treating  abstract  threads  as  resources.  Like  heap  resources,  abstract  thread  resources 
can  be  either  local  or  shared.  We  can  do  ownership  transfers  on  thread  resources.  When 
context  switches,  one  thread  will  transfer  some  of  the  abstract  thread  resources  (shared) 
along  with  the  shared  memory  to  the  next  thread.  As  shown  in  Fig.  [3  when  thread  A 
context  switches  to  thread  B,  the  notation  [A]  will  be  changed  to  (A)  after  context  saving; 
(A)  and  (S)  are  transferred  to  the  thread  B  along  with  the  shared  memory  resource  next; 
then  (S)  will  be  changed  to  [B]  after  context  loading.  With  transferred  thread  resources, 
thread  B  will  know  there  is  a  ready  thread  A  to  switch  to.  Therefore,  by  treating  abstract 
threads  as  resources,  we  find  a  simple  and  natural  way  to  specify  and  reason  about 
context  switches.  We  design  a  proof  system  similar  to  CSL  for  modular  verification 
with  the  support  of  ownership  transfers  on  thread  resources. 

Defining  concrete  thread  resources.  To  establish  the  soundness  of  our  proof  system,  we 
must  ensure  that  the  abstract  threads  can  be  reified  by  concrete  threads.  The  concrete 
representation  of  abstract  threads,  including  stack,  TCBs  etc. ,  can  be  defined  globally.  In 
Fig.[3  suppose  that  thread  A  is  running,  we  ensure  that  there  are  two  blocks  of  resources 
in  the  system.  One  of  them  is  the  running  thread  CThrdA  and  the  other  is  a  ready  thread 
RThrdB.  They  correspond  to  the  abstract  threads  [A]  and  (6)  in  the  assertions  of  thread 
A.  We  use  the  concrete  thread  resources  to  specify  the  global  invariant  of  the  machine, 
which  allows  us  to  prove  the  soundness  of  our  proof  system. 

3  Machine  model 

In  this  section,  we  define  a  two-layer  machine  model.  The  physical  machine  we  use  is 
similar  to  realistic  hardware,  where  no  concept  of  thread  exists.  Based  on  it,  we  define 
an  abstract  machine  with  logical  abstract  threads,  whose  meta-data  is  abstracted  into 
a  thread  pool.  Moreover,  the  operation  of  context  switch  is  abstracted  as  a  primitive 
abstract  instruction. 

Physical  machine.  The  formal  definition  of  the  physical  machine  is  shown  in  Fig.  |4] 
(left  side).  A  machine  configuration  W  consists  of  a  code  block  C,  a  memory  block  M, 
a  register  file  R  and  a  program  counter  pc.  The  machine  has  6  general  registers.  Some 
common  instructions  are  defined  to  write  programs  in  this  paper.  Their  meanings,  as 
well  as  the  operational  semantics,  follow  the  conventions.  For  simplicity,  we  omit  many 
realistic  hardware  details,  e.g.,  address  alignment  and  bits-arithmetic. 

Abstract  machine.  The  abstract  machine  is  shown  in  Fig.  |4] (right  side),  where  threads 
are  introduced  at  this  level.  It  is  more  intuitive  to  build  a  proof  system  (Sec.  HI  to  verify 
concurrent  kernel  code  at  this  level.  A  thread  pool  T  is  a  partial  mapping  from  thread 
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Fig.  4.  Physical  and  abstract  machine  models 


IDs  t  to  abstract  threads  T.  Each  abstract  thread  has  a  tag  specifying  its  status,  which  is 
either  running  (run)  or  ready  (rdy).  Each  ready  thread  has  a  copy  of  saved  register  hie 
as  its  machine  context  data.  The  abstract  instructions  include  an  abstract  operation  of 
context  switch  (cswitch)  and  other  physical  machine  instructions  dehned  on  the  left. 
We  model  the  operational  semantics  using  the  step  transition  relation  W  i — s-  W'  dehned 
in  Eig.|5]  The  abstract  instruction  cswitch  requires  two  thread  IDs  passed  as  arguments 
in  aO  and  al,  one  of  which  is  tagged  by  run  and  the  other  is  taged  by  rdy  in  the  thread 
pool.  After  cswitch,  the  two  abstract  threads  exchange  tags,  and  the  control  of  processor 
is  passed  from  the  old  thread  to  the  new  one.  The  registers  of  old  thread  are  saved  in  the 
source  abstract  thread  and  the  registers  in  the  destination  thread  are  loaded  into  machine 
state.  Except  for  cswitch,  the  state  transitions  of  other  instructions  are  similar  to  those 
of  the  physical  machine. 

Machine  translation.  In  our  proof  system,  once  a  program  is  proved  safe  at  the  abstract 
machine  level,  it  should  be  proved  safe  as  well  at  the  physical  machine  level.  We  dehne 
a  relation  between  abstract  machine  with  physical  machine  (in  the  TR).  The  code 
block  at  the  abstract  machine  level  is  extended  with  the  code  of  implementation  of 
context  switch,  and  the  abstract  instruction  cswitch  is  translated  to  a  call  instruction  that 
invokes  the  implementation  code  of  context  switch.  The  memory  block  at  the  abstract 
machine  level  is  translated  to  physical  memory  block  by  being  merged  with  the  memory 
where  context  data  is  stored.  By  the  translation,  it  can  be  proved  that  any  safe  program 
over  the  abstract  machine  is  safe  over  the  physical  machine. 

4  Proof  system 

In  this  section,  we  extend  the  assertion  language  of  CSL  to  specify  the  thread  resources, 
and  propose  a  small  proof  system  supporting  verihcation  of  concurrent  code  with  mod- 
ihcation  of  TCBs  at  the  assembly  level. 
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Fig.  5.  Operational  semantics  of  abstract  machine 


4.1  Assertion  language  and  code  specification 


We  use  p  and  q  as  assertion  variables,  which  are  predicates  over  machine  states.  The 
assertion  constructs,  adapted  from  separation  logic  El,  are  shallowly  embedded  in  the 
meta  language  ,  as  shown  in  Fig.|6l  In  our  assertion  language,  there  are  two  special  as¬ 
sertion  constructs  for  abstract  threads.  One  of  them  is  {t)  specifying  a  ready  thread  and 
the  other  is  [t]  specifying  a  current  running  thread.  Since  threads  are  explicit  resources 
in  the  abstract  machine,  their  machine  context  data  (values  in  registers)  are  preserved 
across  context  switch.  Hence  the  resources  of  registers  shouldn’t  be  shared.  We  ex¬ 
plicitly  mark  a  pure  assertion  by  #,  which  forbids  an  assertion  specifying  resources. 
An  unary  notation  (op)  mark  an  assertion  p  that  only  specifies  shared  resources  but 
no  thread  local  resources  {e.g.,  registers).  Registers  are  also  treated  as  resources,  and 
r  w  specifies  a  register  with  the  value  of  w.  The  notation  ri,...,r„  h-j,  wi,...,w„  is  a 
compact  form  for  multiple  registers. 

We  borrow  the  idea  from  SCAP  and  use  a  {p,g)  pair  to  specify  instructions  at 
assembly-level.  The  pre-condition  p  describes  the  state  before  the  first  instruction  of 
an  instruction  sequence,  while  the  action  g  describes  the  actions  done  by  the  whole  in¬ 
struction  sequence.  In  the  proof  system,  each  instruction  is  associated  with  a  {p,g)  pair, 
where  g  describes  the  actions  from  this  instruction  to  the  end  of  the  current  function.  For 
all  instructions  in  C,  their  (p,g)  pairs  are  put  in  T',  a  global  mapping  from  labels  to  spec¬ 
ifications.  The  specification  form  (p,g)  is  different  from  the  traditional  pre-condition 
and  post-condition,  which  are  both  assertions  and  related  by  auxiliary  variables.  We  can 
still  use  a  notation  to  specify  instructions  in  the  traditional  style. 
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true 

false 


=  X{M,R,F).  True 
=  X{M,R,P).  Fa\se 
emp  =  X{M,R,P).M  =  {-}  AR  =  {-}  AP  =  {-} 

p  ^  q  =  X{M^R^P^  .^M\^M2^R\TR2TP\^P2-M  =  M\\iiM2  A  R  =  R\^R2  A  P  =  P\^P2 
Ap  {My,Ry,Py)  A  q  {M2,R2,P2) 

p-*q  4  X{M,R,P).'iMi,Ri,Pi,M',R',P' .{M'=Mx\iiM  AR'=Rx\i:iRAP'  =  Px\iiP) 
^p[Mx,Rx,Px)~^q(.M’,R’,P') 
pA\q  =  XS.{pS)A{qS) 
pWq  =  XS .  [p  S)  V  {q  S) 

3v.p  =  XS.3v.pS 

tip  ^  X{M,R,P).pAM  =  {-}AR  =  {-}AP={-} 
op  =  X{M,R,P).p{M,R,P)  AR  =  {-} 
rh^w  =  X{M,R,P).R  =  {]::vi}  AM  =  {-}  AP={-} 
r w  =  X{M,R,P).3R' .R  =  {r  :vi}{i)R' 

1  w  =  =  Al7^NULLA/?  =  {-}A/’={-} 

[f]  =  X{M,R,P).P  =  {t:nm}At^mLLAM  =  {-}AR  =  {-} 

(t)  =  X{M,R,P).P  =  {t:{rdy,.)}  At^mLLAM  =  {-}  AR  =  {-} 

Fig.  6.  Definition  of  selected  assertion  constructs 


(''l.-.v,.) 

=  (X5.3vi,...,v„.(p(vi,...,v„)  >i=true)S, 

A,5,5'.Vp'.Vvi,...,v„.(p(vi,...,v„)  *  p')  S  ^  {q{vx,...,Vn)  *  p')  S') 

where  p  is  the  pre-condition  of  instructions,  q  is  the  post-condition,  and  vi , . . . ,  v„  are 
auxiliary  variables  occurring  in  the  precondition  and  the  postcondition.  We  define  a 
binary  operator  for  composing  two  pairs  into  one. 

{P:S)^{p' :§')  =  {Xs.pSA{ys'.gss'^p'  S'), 

XS, S".pS^{3S'.gSS'  Ag'  s'  S")) 

If  an  instruction  sequence  satisfies  {p,g)  and  the  following  instruction  sequence  satis¬ 
fies  {p',g'),  then  the  composed  instruction  sequence  would  satisfy  {p,g)t>  {p'  ,g').  The 
weakening  relation  between  two  pairs  is  defined  as  below: 

{p,g)^{p',g')  =  ys.ps^p'SA  {ys' .g' s s' ^ g s s') 

i.e.,  the  precondition  p  be  stronger  than  p'  and  the  action  g  be  weaker  than  g'. 

(Assert)  p,q  ::=  true  |  false  |  emp  |  p  *  ^  |  p |  pAr?  |  pVg  |  3v.p  |  1  ha  w 

I  W  I  (f)  I  I  op  I  r  HA  w  I  r  ^  w 

(Action)  g  e  State  -a  State  Prop 

(Spec)  ::=  {f :  (p.g)}* 
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4.2  Invariant  for  shared  resonrces  and  inference  rules 

As  mentioned  previously,  our  proof  system  draws  ideas  of  ownership  transfer  from 
CSL.  By  defining  invariants  for  shared  resources,  our  proof  system  ensures  safe  opera¬ 
tions  of  TCBs. 

Unlike  the  invariant  in  concurrent  separation  logic,  the  invariant  of  shared  resources 
defined  in  our  proof  system  is  parameterized  by  two  thread  IDs;  I{ts,tj).  Briefly,  the 
invariant  describes  the  shared  resources  before  context  switch  with  the  direction  from 
the  thread  t,  to  tj.  One  of  the  benefits  of  parameters  is  that  the  invariant  is  thread- 
specific. 

Like  the  abstract  invariant  I  in  CSL,  the  invariant  is  abstract  and  can  be  in¬ 

stantiated  to  concrete  definitions  to  verify  various  programs,  as  long  as  the  instantiation 
satisfies  the  requirement  of  being  precise  ini. 

Precisely,  the  invariant  I{ts,td)  describes  the  shared  resources  when  the  context  switch 
is  invoked  from  the  thread  tj  to  the  thread  t^,  but  excluding  the  resources  of  the  two 
threads.  Since  the  control  flow  from  one  thread  to  another  is  deterministic  by  context 
switch,  every  two  threads  may  negotiate  a  particular  invariant  that  is  different  from  pairs 
of  other  threads.  We  can  define  different  assertions  (of  shared  resources)  which  depend 
on  the  source  and  the  destination  threads  of  a  context  switch.  This  is  quite  different 
from  concurrent  code  at  user-level,  where  a  context  switch  is  non-deterministic  and  the 
scheduling  algorithm  is  abstracted  away. 

The  judgment  for  instructions  in  our  proof  system  is  of  the  following  form:  T',/  h 
{(p,g)}  pc  :  c,  where  'P  and  /  are  given  as  specifications.  The  judgement  states  that  an 
instruction  sequence,  started  with  c  at  the  label  of  pc  and  ended  with  a  ret,  satisfies 
specification  {p,g)  under  *P  and  I.  Some  selected  inference  rules  for  instructions  are 
shown  in  Fig.  |2l 

In  the  rule  of  (ADD) ,  the  premise  says  that  the  specification  {p,g)  implies  the  action 
of  the  add  instruction  composed  with  the  specification  of  the  next  instruction,  *P(pc-|-l). 
The  action  of  add  instruction  is  that  if  the  destination  register  contains  the  value  of 
wi ,  and  the  source  register  contains  the  value  of  W2,  then  after  the  instruction,  xj  will 
contain  the  sum  of  wi  and  W2,  while  r^  will  remain  unchanged. 

Functions  are  reasoned  with  the  rules  of  (CALL)  and  (RET) .  The  (CALL)  rule 
says  that  the  specification  {p,g)  implies  the  action  that  is  composed  by  (1)  the  action 
of  instruction  call,  (2)  the  specification  of  the  function  invoked  *P(f),  (3)  the  action  of 
instruction  ret,  and  (4)  the  specification  of  the  next  instruction  *P(pc  +  l).  The  (RET) 
rule  says  that  the  specification  {p,g)  implies  an  empty  action,  which  means  the  actions 
of  the  current  function  should  be  fulfilled. 

The  most  important  rule  is  (CSW) .  The  precondition  of  cswitch  requires  the  fol¬ 
lowing  resources;  the  current  thread  resource,  the  registers  aO  containing  the  current 
thread  ID  t  and  al  containing  the  destination  thread  ID  t',  and  the  shared  resource  sat¬ 
isfying  the  invariant  oI{t,t').  After  return  from  context  switch,  the  current  thread  will 
own  the  shared  resources  (satisfying  ol{t",t)  for  some  t")  again. 

4.3  Invariant  of  global  resonrces  and  sonndness 

Each  abstract  thread  corresponds  to  the  part  of  global  resources  representing  the  con¬ 
crete  resources  allocated  for  this  thread.  For  example,  for  an  abstract  thread  (t),  there 
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/  ,  \  /  ^  (wl,w2) 

,  ,  (r^  wl)  *  (r,  w2)  1 

(p,g)  ^  ,  , ,  ON  .  N  r  >*i'(pc+i) 

(r^  h->-  wl+w2)  *  (r^  i-^  W2)  I 


(p>«) ^ 


1“  {{p,g)}  pc  :  add  rj, 
ra  H-  _  1  I  ra  i-^-  pc  +  1  I 


}f  ra  i-^-  , 

>‘I'(f)c><^  ^>‘I'(pc  +  l) 

I  ra  H-  _  J 

{(P,g)}  pc  :  call  f 


(ADD) 


(CALL) 


{P,g)  ■■ 


emp 

emp 


(P,g)  =>'i'(f) 


(RET) 


‘P,^l“{(P,,?)}pc:ret 

[f]  *  (aO,  al.ra  ha  *  {t')  *  ol{l,t' 


‘J',^|-{(P,g)}pc:  jmpf 
(»/) 


(JMP) 


(p,g)  =^\  >  l>'P(pc  +  l) 

'¥,1  h  {(p,g)}  pc  :  cswitch 
Fig.  7.  Inference  rules  (selected) 


(CSW) 


exist  resources  of  its  TCB,  stack,  and  private  resources.  Therefore,  all  resources  can  be 
divided  into  parts  and  each  of  them  is  associated  to  one  thread.  The  global  invariant 
GINV,  defined  in  Fig.  [8]  describes  the  partition  of  all  resources  globally.  The  invariant 
is  the  key  for  proving  the  soundness  theorem  of  our  proof  system. 

First,  for  each  thread,  we  define  a  predicate  Cont  to  specify  its  resources  and  control 
flow,  i.e.  the  continuation  of  this  thread.  The  first  parameter  n  of  this  predicate  specifies 
the  number  of  functions  nested  in  the  thread’s  control  flow.  If  n  is  equal  to  zero,  it  means 
that  the  thread  is  running  in  the  topmost  function,  which  is  required  to  be  an  infinite  loop 
and  cannot  return.  If  the  number  n  is  greater  than  zero,  the  predicate  says  that  there  is 
a  specification  {p,g)  in  'P  at  pc,  such  that  the  resources  of  the  thread  satisfies  p;  and  g 
guarantees  that  the  thread  will  continue  to  satisfy  Cont  recursively  after  it  returns  to  the 
address  retaddr. 

The  concrete  resources  of  a  running  thread  are  specified  by  a  continuation  Cont  with 
an  additional  condition,  the  running  thread  owns  all  registers.  The  parameter  pc  points 
to  the  next  instruction  the  thread  is  going  to  run.  Here  we  use  an  abbreviation  [Pj  to 
denote  the  resources  of  all  registers,  except  that  the  value  in  ra  is  of  no  interest. 

For  a  ready  thread  (or  a  runnable  thread),  its  concrete  resources  are  defined  by  sep¬ 
arating  implication  if  given  (1)  the  resources  of  saved  machine  context  [1?J,  (2)  the 
abstract  resource  of  itself  [t],  (3)  another  ready  thread  t'  and  (4)  shared  resources  speci¬ 
fied  by  oI{t',t),  the  resources  of  the  ready  thread  can  be  transformed  into  the  resources 
of  a  running  thread.  Its  thread  ID  is  specified  by  the  second  parameter  of  RThrd,  and 
the  third  parameter  is  the  machine  context  data  saved  in  its  TCB.  Please  note  that  the 
program  counter  of  a  ready  thread  is  saved  into  the  register  ra. 

The  whole  machine  state  can  be  partitioned,  and  each  part  is  owned  by  one  thread, 
which  is  either  running  or  ready.  Thus,  the  global  invariant  GINV  is  defined  in  the  form 
of  separating  conjunction  by  CThrd  and  RThrd.  The  structure  of  GINV  is  isomorphic  to 
the  thread  pool  P:  the  abstract  running  thread  is  mapped  to  the  resource  specified  by 
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L^J 

Cont(n+l,'i',pc) 

Cont(0,'i',pc) 

CThrd('i',f,pc) 

RThrd('I',f,^) 

GINV('I',/’,pc) 


=  (ra  _)  !(!  (vO  H’  R(vO))  *  (sp  i— ?>  i?(sp)) 

i<(aO  H’  i?(aO))  !i<  (al  i-7>  i?(al))  4=  (a2  i->-  R{a2)) 

=  XS.'f>{pc)  =  {p,g)  A  (pS) 

A  (V5^g  5  5'  — ^  {3retaddr.  (ra  ■”->  retaddr)  ACont{n,'¥,  retaddr))  S') 
=  XS.'i>{pc)  =  {p,g)  A  (pS)  A  (Vy.gSy-)-  False) 

=  3«.Cont(«,*I',pc)  A([f]  *  true) 

=  *  [t]  *3t'  .{t')  *oI{t',t)  -*CThrd('i',f,i?(ra)) 

=  CThrd('i',f,pc)  RThrd('i',fo,i?o)  *  *  RThrd(‘I',r„,/?„) 

where  P  =  {t :  run,  tg  :  {rdy,Rg),  1„  :  {rdy,R„)} 


Fig.  8.  Concrete  threads  and  the  global  invariant 


struct  tcb  1! 

struct  context  ctxt; 
struct  tcb  *prev; 
struct  tcb  *next ; 

}; 

struct  queue  { 

struct  tcb  *head; 
struct  tcb  *tail; 

}; 

struct  tcb  *cur; 

struct  queue  rq; 


void  schedule_p2 0 

i; 

struct  tcb  +old,  *new; 

old  =  cur ; 

new  =  deq(&rq) ; 

if  (new  ==  NULL)  return; 

enqC&rq,  old) ; 

cur  =  new; 

cswitch(old,new) ; 

return; 

} 


Fig.  9.  Pseudo  C  code  for  schedule_p2  () 


CThrd;  an  abstract  ready  thread  is  mapped  to  a  resource  specified  by  RThrd.  Note  that 
GINV  requires  that  there  be  one  and  only  one  running  abstract  thread,  since  the  physical 
machine  has  only  one  single  processor.  Our  proof  system  ensures  that  the  machine  state 
always  satisfies  the  global  invariant,  (GINV(*P,P,pc)  {M,R,P)). 

The  soundness  property  of  our  proof  system  states  that  any  program  that  is  well- 
formed  in  our  proof  system  will  run  safely  on  the  abstract  machine.  The  property  can 
be  proved  by  the  global  invariant  GINV,  which  always  holds  through  machine  execution. 
We  can  first  prove  that  if  every  machine  configuration  satisfies  GINV,  it  can  run  forward 
for  one  step.  And  we  can  also  prove  that  if  a  machine  configuration  (satisfying  GINV) 
can  proceed,  the  next  machine  configuration  will  also  satisfy  GINV.  Hence  by  the  invari¬ 
ant  GINV,  the  soundness  theorem  of  our  proof  system  can  be  proved.  The  proof  of  the 
soundness  theorem  has  been  formalized  in  Coq  Q. 

5  Verification  cases 


In  this  section,  we  show  how  to  use  the  proof  system  to  verify  two  schedulers  of  pat¬ 
tern  (II)  and  (III)  shown  in  Fig.  [T]  We  give  the  code  written  in  pseudo  C  to  explain 
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the  programs  and  their  specifications.  The  corresponding  assembly  code  and  selected 
assertions  of  the  two  schedulers  are  shown  in  Fig.fTol 

Scheduler  as  function.  The  scheduler  function  schedule_p2()  (see  Fig.  |9l)  follows  the 
process  discussed  in  Sec.  |2]  The  functions  deq()  and  enqO  are  used  to  remove  and 
insert  nodes  in  thread  queues.  The  main  task  of  the  scheduler  is  to  choose  a  candidate 
from  the  thread  queue  and  then  perform  context  switch  from  the  current  thread  to  the 
candidate.  There  are  two  global  variables,  cur  and  rq.  The  variable  cur  points  to  the 
TCB  of  the  running  thread;  rq  points  to  the  thread  queue  containing  TCBs  of  all  other 
runnable  (ready)  threads. 

The  notation  t  w  specifies  a  named  field  in  the  structure.  The  notation  ptcb(f) 
specifies  a  part  of  TCB  including  the  fields  of  next  and  prev.  The  predicate  RQ(i5',L) 
specifies  a  doubly  linked  list  as  a  thread  queue  pointed  to  by  q,  where  Lisa  list  of  thread 
IDs  of  the  thread  queue.  We  also  use  (L)  as  an  abbreviation  for  (to)  *  (ti  )>!<•••  *  (?„),  if  L 
is  tQ  ::  ti  t„  ::  nil,  and  use  1  to  specify  n  continuous  memory  cells. 


field 

1 1 — >  w 


ptcb(t) 

RQseg(pv,r/,f,nil) 
RQseg(pv,ri,t,f'  L') 
RQ(^,nil) 

RQ(^,r  ::  L) 

K(bp,n,v()  ::  wi  Wm  nil) 


K(bp,n) 


=  {l+ojfsetofthe  field  in  the  struct)  w 

=  (r  h^pv)  *  (t  NULL)  *  (j(r  =  r/) 

=  (f  pv)  4=  (r  t’)  >!<  RQseg(t,f/,f',L') 

^  NULL)  NULL) 

=  3pv  .3tl .{q  1)  *  (qh^tl)  *  RQseg{pv,  f/,r,L) 

=  3jp.(sp  sp)  *  ‘^{sp  =  bp+4n)  *  {bp  !->■  (")_) 

*{sp  H-;-  wg)  *  (ip+4  i-7>  wj)  •  •  •  *  {sp+4m  ht-  w^) 
=  K{bp,n,n\\) 


The  specification  of  schedule_p2()  is  shown  below: 


[f]  *  ptcb(r)  *  (cur  t)  *  3L.  RQ(rq,L)  *  (L)  *  (ra  ret) 

*K{bp,20)  *  (vO,aO,al  -,-,-) 

[f]  *  ptcb(r)  *  (cur  !->■  t)  *  3L.  RQ(rq,L)  *  (L)  *  (ra  i-J-  ret) 

*K{bp,20)  *  (vO,aO,al  _,_,_) 

Here  we  use  a  notation  K{bp,n,w ::  w'  ::•••)  to  describe  a  stack  frame.  The  first  parameter 
bp  is  the  base  address  of  a  stack  frame.  The  second  parameter  n  is  the  size  of  unused 
space  (number  of  words).  And  the  third  parameter  is  a  list  of  words,  representing  the 
values  on  stack  top  down,  that  is,  the  leftmost  value  in  the  list  is  the  topmost  value  in 
the  stack  frame.  If  the  stack  frame  is  empty,  we  omit  the  third  parameter. 

The  abstract  invariant  /  is  instantiated  to  a  concrete  definition  specifying  the  shared 
resources  before  and  after  context  switch  for  this  implementation  of  scheduler. 

l{t,t')  —  ptcb(?')  *  (cur  f')  3L.  RQ(rq,t  ::  L)  *  (L) 
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schedule_p2 : 

{[f]  >!<  ptcb(/)  Ht  (cur  t)  *3L.RQ(rq,L) 
*{L)  *  (aO,al,vO,ra  h->-  ret) 

*K{bp,20)} 


subi 

sp,  12 

sw 

ra,  8(sp) 

movi 

aO ,  cur 

Iw 

vO,  0(a0) 

sw 

vO,  0(sp) 

{[f]  >!<  ptcb(f)  >!<  (cur  f)  *3L.RQ(rq,L) 

*{L)  *  (a0,al,v0,ra  i-f  cur,_,f,_) 

*K(l7p,  17,f  ::  _  : 

::  ref ::  nil)} 

movi 

aO,  rq 

call 

deq 

bz 

vO,  Ls_ret 

{[f]  >!<  ptcb(f)  (f') 

*  ptcb(f')  *3L.RQ(rq,L) 

*{L)  4=  (aO,al,vO,ra  i-f  rq,_,f',_) 

^=K(l7p,17,f::.: 

::  ref ::  nil)  (cur  r-)-  f)} 

sw 

vO,  4(sp) 

Iw 

al,  0(sp) 

call 

enq 

{[f]  >!<  (f')  *  ptcb(f') 

=K3L.RQ(rq,f  ::L)  *  (L) 

*  (aO,  al,  vO,ra  rq,f,0,_) 

::  ref ::  nil)  *  (cur  (->■  f)} 

Iw 

al,  4(sp) 

movi 

aO ,  cur 

sw 

al,  0(a0) 

Iw 

aO,  0(sp) 

{[f]  *  (f')  >it3L.  RQ(rq,f ::  L)  4=  (L)  >!<  ptcb(f') 

*  (aO,  al,  vO,ra  f,f',0,_) 

>i=K(l7p,  17,f  ::f' 

::  ref ::  nil)  *  (cur  f')} 

cswitch 

{[f]  >!<  ptcb(f)  >i<3f" 

.(f")  *3L.RQ(rq,f"  ::L) 

*{L)  *  (aO,al,vO,ra  h->-  f,f',_,_) 

>i=K(fop,  17,f::_: 

::  ref ::  nil)  *  (cur  f)} 

Ls_ret : 


Iw  ra,  8(sp) 

addi  sp,  12 

{[f]  *  ptcb(/)  Ht  (cur  (->■  t)  *3L.  RQ(rq,L) 
*{L)  4=  (aO,al,vO,ra  ref) 

*K(bp,20)} 

ret 


schedth: 

{[sched]  *  (cur  i-)-  _)  *  3L.  RQ(rq,L)  *  (L) 
4=  (aO, al, vO,ra 


*3bp  .K{bp,  10)} 

movi 

aO , 

rq 

call 

deq 

bz 

vO , 

schedth 

movi 

a2 , 

cur 

sw 

vO , 

0(a2) 

mov 

al. 

vO 

Iw 

aO , 

sched 

{[sched]  (f')  4=  (cur  i-f  t')  *  ptcb(/') 
i(=3L.  RQ(rq,L)  4=  (L) 

4=  (aO, al, vO,ra  !-!•  sched, f',_,_) 

*3bp .  K{bp,  10)} 

cswitch 

{[sched]  .{t")  >i<  ptcb(f")  *  (cur  /") 
>i=3L.  RQ(rq,L)  *  (L)  >i<3Z?/f .  K(fcp,  10) 

*  (aO,  al,  vO,ra  i— 5>  sched,  _,_,_)} 


movi 

aO ,  rq 

Iw 

al,  0(a2) 

call 

enq 

jinp 

schedth 

schedule_p3 : 

{[f]  *  ptcb(/)  *  (sched)  *  (cur  i-f  f) 

*  (aO,  al,ra  (->■  ref)  *  K{bp,  10)} 


subi 

sp. 

4 

sw 

ra. 

0(sp) 

movi 

al. 

cur 

Iw 

aO , 

0(al) 

movi 

al. 

sched 

{[t]  *  ptcb(/)  * 

(sched)  *  (cur  h->-  f) 

>i<(aO,al,ra  i-f  f, sched, ref) 

*  K{bp,9,rei 

cswitch 


{[f]  *  ptcb(f)  *  (sched)  *  (cur  h->-  f) 
*(aO,al,ra  (->■  _,_,ret)  *  K(fep,9,ref)} 

Iw  ra,  0(sp) 

addi  sp ,  4 

{[f]  *  ptcb(f)  *  (sched)  *  (cur  i-f  f) 

*  (aO,  al,ra  (->■  ref)  *  K{bp,  10)} 

ret 


ret  ret 

Fig.  10.  Verification  of  schedule_p2(),  schedthO  and  schedule_p3() 
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struct  tcb  sched; 
struct  tcb  *cur; 
struct  queue  rq; 
schedule_p3() 

{ 


schedthO 


cswitchCcur j&sched) ; 


while (1){ 

cur  =  deqC&rq) ; 
cswitchC&sched,  cur); 
enqC&rq,  cur) ; 


return; 

} 


} 

> 


Fig.  11.  Pseudo  C  code  for  schedule_p3() 


Scheduler  as  a  separated  thread.  A  scheduler  in  the  pattern  (III)  is  implemented  as  a 
separated  thread  (see  Fig.  Ell,  which  does  scheduling  jobs  in  an  infinite  loop.  A  global 
variable  sched  is  added  to  represent  the  TCB  of  the  scheduler  thread.  A  stub  function 
schedule_p3  0  can  be  invoked  by  other  threads  to  do  scheduling.  As  shown  below, 
the  specification  of  schedule_p3  0  function  is  different  from  the  one  of  schedule_p2  () . 
The  schedule  function  in  this  implementation  doesn’t  own  the  thread  queue,  which  is 
owned  by  the  scheduler  thread  (sched)  instead  since  all  of  the  operations  over  the  thread 
queue  are  put  into  the  separated  thread. 


[f]  ptcb(r)  (cur  t)  *  (sched)  *  (aO,al,ra  .,.,ret)  >i<  K(fop,  10) 
[f]  *  ptcb(r)  *  (cur  \-^  t)  *  (sched)  *  (a0,al,ra  >i<  K{bp,lQ) 


The  specification  of  schedthO  function  is  shown  below: 
(  [sched]  *  (cur  i— >■  _)  *3L.RQ(rq,L)  *  (L) 


*(a0,al,a2,v0,ra  *3bp .K{bp, 10) 


false 


Since  the  ready  thread  queue  is  only  owned  by  the  scheduler  thread,  it  does  not  need  to 
be  shared  by  other  threads  and  occur  in  the  invariant  for  the  shared  resources,  /: 

l{t,t’)  =  (tt(r'  =  sched)  *  (cur  M.  f)  >1=  ptcb(f))  ¥(ji(f  =  sched)  >1=  (cur  t')  4=  ptcb(t^)) 

The  invariant  l{t,t')  is  defined  by  two  cases  on  the  direction  of  context  switch;  if  the 
destination  thread  is  the  scheduler  thread,  I{t,t')  requires  that  the  value  in  cur  be  equal 
to  the  ID  of  the  source  thread,  t;  or  if  the  source  thread  is  the  scheduler  thread,  l{t,t’) 
requires  that  the  value  in  cur  be  equal  to  the  ID  of  the  destination  thread. 

6  Related  work  and  conclusions 

Gotsman  and  Yang  ||6l  proposed  a  two-layer  framework  to  verify  schedulers.  The  proof 
system  in  the  lower-layer  is  for  verifying  code  manipulating  TCBs,  while  the  upper- 
layer  is  for  verifying  the  rest  concurrent  code  of  the  kernel.  Since  thread  queues  and 
TCBs  are  hidden  from  the  upper-layer,  one  thread  could  not  have  any  knowledge  of  the 
others,  thus  their  proof  system  is  unable  to  verify  the  scheduling  pattern  of  II  and  III. 
Similar  to  our  assertion  RThrd(-  •  • ),  they  introduced  a  primitive  predicate  Process{G)  to 
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relate  TCBs  in  the  lower-layer  with  threads  in  the  upper-layer,  but  there  is  no  counter¬ 
part  of  (t)  in  their  framework. 

Feng  et  al.  also  verified  a  kernel  prototype  13  in  a  two-layer  framework.  Code 
manipulating  TCBs  needs  to  be  verified  in  the  lower-layer  of  their  framework.  The 
TCBs  are  connected  with  actual  threads  in  the  upper  layer  by  an  interpretation  function 
of  their  framework.  Our  use  of  global  invariant  is  similar  to  their  use  of  the  interpretation 
function.  In  the  upper-layer,  information  of  threads  is  completely  hidden.  Thus,  their 
framework  also  fails  to  support  the  verification  of  the  scheduler  pattern  of  II  and  III. 

Ni  et  al.  verified  a  small  thread  manager  with  a  logic  system  II15I14II  supporting 
modular  reasoning  about  code  including  embedded  code  pointers.  In  their  logic,  how¬ 
ever,  there  is  no  abstraction  of  threads.  Multithreaded  programs  are  seen  as  sequential 
interleaving  of  pieces  of  code  in  low-level  continuation  passing  style.  Therefore,  TCBs 
with  embedded  code  pointers  can  be  treated  as  normal  data.  But  since  the  reasoning 
level  is  too  low  without  any  abstraction,  TCBs  have  to  be  specified  by  over-complicated 
logic  expressions  and  then  it  is  very  difficult  to  apply  their  method  to  realistic  code. 

Klein  et  al.  verified  a  micro-kernel,  seL4  im,  where  the  kernel  code  runs  sequen¬ 
tially.  Thus  they  used  a  sequential  proof  system  to  verify  most  of  the  kernel  code.  The 
scheduling  pattern  of  seL4  is  similar  to  our  pattern  I,  but  they  trusted  the  code  doing 
context  saving  and  loading,  and  left  it  unverified.  Since  they  do  not  verify  user  processes 
upon  the  kernel,  they  need  not  relate  TCBs  in  the  kernel  with  actual  user  processes. 

Gargano  et  al.  used  a  framework  CVM  111  to  build  verified  kernels  in  the  Verisoft 
project.  CVM  is  a  computational  model  for  concurrent  user  processes,  which  interleave 
through  a  micro-kernel.  Starostin  and  Tsyban  presented  a  formal  approach  m  to  rea¬ 
son  about  context  switch  between  user  processes.  The  context  switch  code  and  proofs 
are  integrated  in  a  framework  for  building  verified  kernels  (CVM)  ifTOll .  Their  frame¬ 
work  keeps  a  global  invariant,  weak  consistency,  to  relate  TCBs  in  the  kernel  with  user 
processes  outside  the  kernel.  Since  the  kernel  itself  is  sequential,  their  process  schedul¬ 
ing  follows  pattern  I.  The  other  two  patterns  cannot  be  verified. 

In  this  paper,  we  proposed  a  novel  approach  to  verify  concurrent  thread  manage¬ 
ment  code,  which  allows  multiple  threads  to  modify  their  own  thread  control  blocks. 
The  assertions  of  the  code  and  inference  rules  of  the  proof  system  are  straightforward 
and  easy  to  follow.  Moreover,  it  can  be  easily  extended  to  support  other  kernel  features 
(e.g.,  preemptive  scheduling,  multi-core  systems,  synchronizations)  and  to  be  practi¬ 
cally  applied  to  realistic  OS  code. 
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Abstract.  A  virtual  memory  manager  (VMM)  is  a  part  of  an  operating  system 
that  provides  the  rest  of  the  kernel  with  an  abstract  model  of  memory.  Although 
small  in  size,  it  involves  complicated  and  interdependent  invariants  that  make 
monolithic  verification  of  the  VMM  and  the  kernel  running  on  top  of  it  difficult. 
In  this  paper,  we  make  the  observation  that  a  VMM  is  constructed  in  layers:  phys¬ 
ical  page  allocation,  page  table  drivers,  address  space  API,  etc.,  each  layer  pro¬ 
viding  an  abstraction  that  the  next  layer  utilizes.  We  use  this  layering  to  simplify 
the  verification  of  individual  modules  of  VMM  and  then  to  link  them  together 
by  composing  a  series  of  small  refinements.  The  compositional  verification  also 
supports  function  calls  from  less  abstract  layers  into  more  abstract  ones,  allow¬ 
ing  us  to  simplify  the  verification  of  initialization  functions  as  well.  To  facilitate 
such  compositional  verification,  we  develop  a  framework  that  assists  in  creation 
of  verification  systems  for  each  layer  and  refinements  between  the  layers.  Using 
this  framework,  we  have  produced  a  certification  of  BabyVMM,  a  small  VMM 
designed  for  simplified  hardware.  The  same  proof  also  shows  that  a  certified  ker¬ 
nel  using  BabyVMM’s  virtual  memory  abstraction  can  be  refined  following  a 
similar  sequence  of  refinements,  and  can  then  be  safely  linked  with  BabyVMM. 
Both  the  verification  framework  and  the  entire  certification  of  BabyVMM  have 
been  mechanized  in  the  Coq  Proof  Assistant. 


1  Introduction 

Software  systems  are  complex  feats  of  engineering.  What  makes  them  possible  is  the 
ability  to  isolate  and  abstract  modules  of  the  system.  In  this  paper,  we  consider  an  op¬ 
erating  system  kernel  that  uses  virtual  memory.  The  majority  of  the  kernel  makes  an 
assumption  that  the  memory  is  a  large  space  with  virtual  addresses  and  a  specific  inter¬ 
face  that  allows  the  kernel  to  request  access  to  any  particular  page  in  this  large  space.  In 
reality,  this  entire  model  of  memory  is  in  the  imagination  of  the  programmer,  supported 
by  a  relatively  small  but  important  portion  of  the  kernel  called  the  virtual  memory  man¬ 
ager.  The  job  of  the  virtual  memory  manager  is  to  handle  all  the  complexities  of  the  real 
machine  architecture  to  provide  the  primitives  that  the  rest  of  the  kernel  can  use.  This 
is  exactly  how  the  programmer  would  reason  about  this  software  system. 

However,  when  we  consider  verification  of  such  code,  current  approaches  are  mostly 
monolithic  in  nature.  Abstraction  is  generally  limited  to  abstract  data  types,  but  such 
abstraction  can  not  capture  changes  in  the  semantics  of  computation.  For  example,  it 
is  impossible  to  use  abstract  data  types  to  make  virtual  memory  appear  to  work  like 
physical  memory  without  changing  operational  semantics.  To  create  such  abstraction,  a 
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change  of  computational  model  is  required.  In  the  Verisoft  project[ll,  18],  the  abstract 
virtual  memory  is  defined  by  creating  the  CVM  model  from  VAMP  architecture.  In 
AIM  [7],  multiple  machines  are  used  to  define  interrupts  in  the  presence  of  a  scheduler. 

These  transitions  to  more  abstract  models  of  computation  tend  to  be  quite  rare, 
and  when  present  tend  to  be  complex.  The  previously  mentioned  VAMP-CVM  jump 
in  Verisoft  abstracts  most  of  kernel  functionality  in  one  step.  In  our  opinion,  it  would 
be  better  to  have  more  abstract  computation  models,  with  smaller  jumps  in  abstrac¬ 
tion.  First,  it  is  easier  to  verify  code  in  the  most  abstract  computational  model  possible. 
Second,  smaller  abstractions  tend  to  be  easier  to  prove  and  to  maintain,  while  larger 
abstractions  can  be  still  achieved  by  composing  the  smaller  ones.  Third,  more  abstrac¬ 
tions  means  more  modularity;  changes  in  the  internals  of  one  module  will  not  have 
global  effects. 

However,  we  do  not  commonly  see  Hoare-logic  verification  that  encourages  multi¬ 
ple  models.  The  likely  reason  is  that  creating  abstract  models  and  linking  across  them 
is  seen  as  ad-hoc  and  tedious  additional  work.  In  this  paper  we  show  how  to  reduce 
the  effort  required  to  define  models  and  linking,  so  that  code  verification  using  multi¬ 
ple  abstractions  becomes  an  effective  approach.  More  precisely,  our  paper  makes  the 
following  contributions: 

-  We  present  a  framework  for  quickly  defining  multiple  abstract  computational  mod¬ 
els  and  their  verification  systems. 

-  We  show  how  our  framework  can  be  used  to  define  safe  cross-abstraction  linking. 

-  We  show  how  to  modularize  a  virtual  memory  manager  and  define  abstract  compu¬ 
tational  models  for  each  layer  of  VMM. 

-  We  show  a  complete  verification  of  a  small  proof-of-concept  virtual  memory  man¬ 
ager  using  the  Coq  Proof  Assistant. 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  2,  we  give  an  informal 
overview  of  our  work.  In  Section  3,  we  discuss  the  formal  details  of  our  verification 
and  refinement  framework.  In  Section  4,  we  specialize  the  framework  for  a  simple  C- 
like  language.  In  Section  5,  we  certify  BabyVMM,  our  small  virtual  memory  manager. 
Section  6  discusses  the  Coq  proof,  and  Section  7  presents  related  work  and  concludes. 

2  Overview  and  Plan  for  Certification 

We  begin  the  overview  by  explaining  the  design  of  BabyVMM,  our  small  virtual  mem¬ 
ory  manager.  First,  consider  the  model  of  memory  present  in  simplified  hardware  (left 
side  of  Figure  1).  The  memory  is  a  storage  system,  which  contains  cells  that  can  be 
read  from  or  written  to  by  the  software.  These  cells  are  indexed  by  addresses.  However, 
to  facilitate  indirection,  the  hardware  includes  a  system  called  address  translation  (AT), 
which,  when  enabled,  will  cause  all  requests  for  specific  addresses  from  the  software 
to  be  translated.  The  AT  system  adds  special  registers  to  the  memory  system  -  one  to 
enable  or  disable  AT,  and  the  other  to  point  where  the  software-managed  AT  tables  are 
located  in  memory.  The  fact  that  these  tables  are  stored  in  memory  is  one  of  the  sources 
of  complexity  in  the  AT  system  -  updating  AT  tables  requires  updating  in-memory  ta¬ 
bles,  a  process  which  goes  through  AT  as  well. 
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Fig.  1.  Hardware  (HW)  and  Address  Space  (AS)  Models  of  Memory 


Fig.  2.  Allocated  (ALE)  and  Page  Map  (PMAP)  Models  of  Memory 


Because  AT  is  such  a  complicated,  machine-dependent,  and  general  mechanism, 
BabyVMM  creates  an  abstraction  that  defines  specific  restrictions  on  how  AT  will  be 
used,  and  presents  a  simpler  view  of  AT  to  the  kernel.  Although  the  abstract  models  of 
memory  may  differ  depending  on  the  features  that  the  kernel  may  require,  BabyVMM 
defines  a  very  basic  model,  to  which  we  refer  as  the  address  space  (AS)  model  of  mem¬ 
ory  (right  side  of  Figure  1).  The  AS  model  replaces  the  small  physical  memory  with 
a  larger  virtual  address  space  with  allocatable  pages  and  no  address  translation.  The 
space  is  divided  into  high  and  low  areas,  where  the  low  area  is  actually  a  window  into 
physical  memory  (a  pattern  common  in  many  kernels).  Because  of  this  distinction,  the 
memory  model  has  two  sets  of  allocation  functions,  one  for  the  “high”  memory  area 
where  the  programmer  requests  a  specific  page  for  allocation,  and  one  for  the  “low” 
memory  area,  where  the  programmer  can  not  pick  which  page  to  allocate. 

However,  creating  an  abstraction  that  makes  the  jump  from  the  HW  model  directly 
to  AS  model  is  complex.  As  a  result,  we  create  two  more  intermediate  models,  which 
slowly  build  up  the  abstraction.  The  first  model  is  ALE  (left  side  of  Figure  2),  which 
incorporates  allocation  information  into  the  hardware  memory,  requiring  that  programs 
only  access  memory  locations  that  are  marked  allocated.  The  model  adds  primitives 
in  the  form  of  mem_alloc  and  mem_f  ree,  with  semantics  same  as  the  ones  in  the  AS 
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Fig.  3.  Complete  Plan  for  VMM  Certification 

model.  Although  this  is  not  shown  on  the  diagram,  the  ALE  model  still  maintains  the 
hardware’s  AT  mechanism. 

The  second  intermediate  level,  which  we  call  PMAP  (right  side  of  Figure  2)  is 
designed  to  replace  the  hardware’s  AT  mechanism  with  an  abstract  one.  The  model 
features  a  page  map  that  exists  outside  the  normal  memory  space,  unlike  the  lower  level 
models.  The  page  map  maps  virtual  page  numbers  to  physical  page  numbers,  with  a  0 
value  meaning  invalid.  In  our  particular  design,  the  pagemap  is  always  identity  for  the 
lower  addresses,  creating  a  window  into  physical  memory  from  within  the  virtual  space. 
The  model  still  contains  allocation  primitives,  and  adds  two  more  primitives,  pt_set 
and  pt_lookup,  which  update  and  lookup  values  in  the  pagemap. 

Using  these  abstract  memory  models,  we  can  construct  the  BabyVMM  verification 
plan  (Figure  3).  The  light-yellow  boxes  in  the  kernel  represent  the  actual  functions 
(actual  code  is  given  in  Appendix  A  of  TR[19]).  The  darker  green  boxes  represent 
computational  models  with  primitives  labeled.  The  diagram  shows  how  each  module  of 
BabyVMM  will  be  certified  in  the  model  best  suited  for  it.  For  example,  the  high-level 
kernel  is  certified  in  the  AS  model,  meaning  that  it  does  not  see  underlying  physical 
memory  at  all.  The  implementation  of  as_request  and  as_release  are  defined  over 
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(State)  S  6  X  (State  Predicate)  p  6  iT  — >  Prop 

(Operation)  i  e  A  (State  Relation)  g  6  iT  — >  iT  — >  Prop 

(Cond)  b  e  p  (Operational  Semantics)  OS  6  {i'^(p,g))* 

(Condinterp)  T  e  p  ^  E  ^  Prop  (Language / Machine)  M  6  (E,A,p,'T,0Si) 

where  M(l)  —  Al.OS(i)  and  M(b)  =  M.T(b) 

Fig.  4.  Abstract  State  Machine 


id 

=  (/IS. True, 

/IS. /IS'.  S'  =  S) 

fail 

=  (AS.  False, 

/IS. /IS'.  False) 

loop 

=  (/IS.  True, 

/IS./IS'.  False) 

(p,g)o(p',g')  ^(dS.pSAVS'.gSS'^p'S',  dS.dS".ES'.gSS'Ag'S'S") 
(p,g)©(p',g')  =  (/iS.(p  S  Ac  S) V (p'  S  A-ic  S),/tS./lS'.(c S  Ag  S  S') V (-ic S  Ag'  S  S')) 

c 

(p,g)  2  (p'.g')  -  VS.p  S  — >  p'  S  A  VS,S'.g'  S  S'  — >  g  S  S' 


Fig.  5.  Combinators  and  Properties  of  Actions 


(Meta-program) 

P 

:=(C,I) 

=  loop 

(Proc) 

I 

:=  nil  U  1  [1]  1  Ii;ll2 

=  id 

l(WIl+l2) 

ICaTm 

=  (M(l)) 

(Proc  Heap) 

C 

:= 

IC,[l]]|Xi 

=  |[C,C(1)1 

(Labels) 

1 

:=  n  (nat  numbers) 

(Spec  Heap) 

:={W(p,g)r 

ICfblli+WTM 

‘JiAl 

ICJiTm 


Fig.  6.  Syntax  and  Semantics  of  the  Meta-Language 

an  abstract  page  map,  and  thus  do  not  have  to  know  how  the  hardware  deals  with  page 
tables,  and  so  on.  The  plan  also  indicates  which  primitives  are  implemented  by  which 
code  (lines  with  circles).  When  we  certify  the  code,  these  will  be  the  cross-abstraction 
links  we  will  have  to  prove.  Lastly,  the  plan  also  indicates  the  stubs  in  the  initialization, 
which  are  needed  to  certify  calls  from  init  to  functions  defined  over  higher  abstraction. 
The  PE  and  PD  models  are  restrictions  on  HW  model,  where  AT  is  always  on,  and 
always  off  respectively.  ALD  is  an  analogue  of  ALE,  where  AT  is  off. 

On  boot,  the  AT  is  off,  and  init  is  called.  The  init  then  calls  mem.init  to  initialize 
the  allocation  table  and  pt_init  to  initialize  the  page  tables.  Then,  init  uses  the  HW 
primitives  to  enable  AT,  and  jumps  into  the  high-level  kernel  by  calling  kernel_init. 
We  will  now  focus  on  the  technical  details  to  put  this  plan  in  action. 


3  Certifying  with  Refinement 


Our  framework  for  multi-machine  certification  is  defined  in  two  parts.  Eirst,  we  create 
a  machine-independent  verification  framework  that  will  allow  us  to  define  quickly  and 
easily  as  many  machines  for  verification  as  we  need.  Second,  we  will  develop  our  notion 
of  refinements  which  will  allow  us  to  link  all  the  separate  machines  together. 
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VI  6  dom(C).M,  'P'j£h  C(l)  :  W(\) 


(code) 


M  ¥'Hl:(p',g')  (p,g)  3  (p',g') 


(weak) 


:(p',g')  :(p",g")  M,Whf  ■.  (p',g')  M,  Whf  ■.  (p",g") 

\A  W  CKT  0  ^  - 7““; - 77 - r“77 - 77"7 - 

M(b) 

(perf) 


M,  W  h  {W.  r+I")  :  (p',g')  ©  (p",g")  :  ((p',g')o  (p",g")) 


(seq) 


(call) 


M,  F  I- 1 :  M(l)  M,W¥[\\-.  ¥^(1)  "  '  M,  mil :  id 

Fig.  7.  Static  Semantics  of  the  Meta-Language 


r;  (nil) 


3.1  A  Machine-Independent  Certification  Framework 

Our  Hoare-logic  based  framework  is  parametric  over  the  definition  of  operational  se¬ 
mantics  of  the  machine,  and  is  sound  no  matter  what  machine  semantics  it  is  param¬ 
eterized  with.  To  begin  defining  such  a  framework,  we  first  need  to  understand  what 
exactly  is  a  machine  on  which  we  can  certify  code.  The  definition  that  we  use  is  given 
in  Figure  4.  Our  notion  of  the  machine  consists  of  the  following  parts: 

-  State  type  (2").  Define  the  set  of  all  possible  states  in  a  machine. 

-  Operations  (J).  This  is  a  set  of  names  of  all  operations  that  the  machine  supports. 

The  set  can  be  infinite,  and  defined  parametrically. 

-  Conditionals  (j3).  Defines  a  type  of  expressions  that  are  used  for  branching. 

-  Conditional  Interpreter  (T).  Converts  conditionals  into  state  predicates. 

-  The  operational  semantics  OS.  This  is  the  main  portion  of  the  machine  definition.  It 

is  a  set  of  actions  (p,  g)  named  by  all  operations  in  the  machine. 

The  most  important  bit  of  information  in  the  machine  are  the  semantics  (OS).  The 
semantics  of  operations  are  defined  by  a  precondition  (p),  which  shows  when  the  op¬ 
eration  is  safe  to  execute,  and  by  a  state  relation  (g)  that  defines  the  set  of  possible 
states  that  the  operation  may  result  in.  We  will  refer  to  the  pair  of  (p,  g)  as  an  action 
of  the  operation.  Later  we  will  also  use  actions  to  define  the  specification  of  programs. 
Because  the  type  of  actions  is  somewhat  complex,  we  define  action  combinators  in  Fig¬ 
ure  5,  including  composition  and  branching.  The  same  figure  also  shows  the  weaker 
than  relation  between  actions. 

Although,  at  this  point  we  have  defined  our  machines,  it  does  not  have  any  notion  of 
computation.  To  make  use  of  the  machine,  we  will  need  to  define  a  concept  of  programs, 
as  well  as  what  it  means  for  the  particular  program  to  execute. 

The  definition  of  the  program  is  given  in  Figure  6.  The  most  important  definition 
in  that  figure  is  that  of  the  procedure,  I.  The  procedure  is  a  bit  of  program  logic  that 
sequences  together  calls  to  the  operations  of  a  machine  (t),  or  to  other  procedures  [1] 
(loops  are  implemented  as  recursive  calls).  Procedures  also  include  a  way  to  branch  on 
a  condition.  The  procedures  can  be  given  a  name,  and  placed  in  the  procedure  heap  C, 
where  they  can  be  referenced  from  other  procedures  through  the  [1]  constructor.  The 
procedure  heap  together  with  a  program  rest  (the  currently  executing  procedure)  makes 
up  the  program  that  can  be  executed. 

The  meaning  of  executing  a  program  is  given  by  the  indexed  denotational  semantics 
shown  on  the  right  side  of  Figure  6.  The  meaning  of  the  program  is  an  action  that  is 
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constructed  by  sequencing  operations.  As  programs  can  be  infinite,  the  semantics  are 
indexed  by  the  depth  of  procedure  inclusion. 

We  use  the  static  semantics  (Figure  7)  to  approximate  the  action  of  a  procedure. 
These  semantics  are  similar  to  the  denotational  semantics  of  the  meta-language,  except 
that  the  specifications  of  called  procedure  are  looked  up  in  the  table  {W).  This  means 
that  the  static  semantics  works  by  the  programmer  approximating  the  actions  of  (speci¬ 
fying)  the  program,  and  then  making  sure  that  the  actual  action  of  the  program  is  within 
the  specifications.  These  well-formed  procedures  are  then  grouped  into  a  well-formed 
module  using  the  code  rule,  which  forms  the  concept  of  a  certified  module  C :  W, 

where  every  procedure  in  C  is  well-formed  under  specification  in  W.  The  module  also 
defines  a  library  (X)  which  is  a  set  of  specifications  of  stubs,  i.e.  procedures  that  are 
used  by  the  module,  but  are  not  in  the  module.  These  stubs  can  then  be  eliminated 
by  providing  procedures  that  match  the  stubs  (see  Section  3.2).  For  a  program  to  be 
completely  certified,  all  stubs  must  either  be  considered  valid  primitives  or  eliminated. 
For  a  proof  of  partial  correctness,  please  see  the  TR. 

3.2  Linking 

When  we  certify  using  modules,  it  will  be  very  common  that  the  module  will  require 
stubs  for  the  procedures  of  another  module.  Linking  two  modules  together  should  re¬ 
place  the  stubs  in  both  modules  for  the  actual  procedures  that  are  now  present  in  the 
linked  code.  The  general  way  to  accomplish  this  is  by  the  following  linking  lemma: 

Theorem  1  (Linking). 

M,XihCi:?'i  M,X2hC2:?'2  Ci  ±  C2  £1 -L  7^2  £2 -L  T'l  £1 -L  £2 
M,((£i  u£2)\(¥^i  u  W2))  I-  Cl  UC2:  'f'l  u  f'a 

where  T'l  _L  !f'2  -  VI  6  dom('P'i).  (1  i  dom(?'2)  V  ?'i(l)  =  'PzCl))- 

However,  the  above  rule  does  not  always  apply  immediately.  When  the  two  modules 
are  developed  independently,  it  is  possible  that  the  stubs  of  one  module  are  weaker  than 
the  specifications  of  the  procedures  that  will  replace  the  stubs,  which  breaks  the  linking 
lemma.  To  fix  this,  we  strengthen  the  library. 

Theorem  2  (Stub  Strengthening). 

If  At,£  I-  C  :  if',  then  for  any  £'  s.t.  VI  6  dom(£).£(l)  3  £'(1)  and  dom(£') n  dom(!f')  =  0,  the 
following  holds:  At,£'  hC'.W. 

This  theorem  allows  us  to  strengthen  the  stubs  to  match  the  specs  of  procedures,  en¬ 
abling  the  linking  lemma.  Of  course,  if  the  specs  of  the  real  procedures  are  not  stronger 
than  the  specs  of  the  stubs,  then  the  procedures  do  not  correctly  implement  what  the 
module  expects,  and  linking  is  not  possible. 

3.3  The  Refinement  Framework 

Up  to  this  point,  we  have  only  considered  what  happens  to  the  code  that  is  certified  over 
a  single  machine.  However,  the  purpose  of  our  framework  is  to  facilitate  multi-machine 
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verification.  For  this  purpose,  we  construct  the  refinement  framework  that  will  allow 
us  to  refine  certified  modules  in  one  machine  to  certified  modules  in  another.  The  most 
general  notion  of  refinement  in  our  framework  can  be  defined  by  the  following: 

Definition  1  (Certified  Refinement). 

A  certified  refinement  from  machine  Ma  to  machine  Me  is  a  pair  of  relations  (Tc,Ttp)  and  a 
predicate  over  the  abstract  certified  module  Acc,  such  that  for  all  C.4,  'F' a,  Fa,  the  following  holds 

Ma,F'a^Ca-.Fa  Acc{Ma,F'a^Ca-.Fa) 

-  REFINE 

Mc,Tv(F'A)^Tc{CAy.Tv(FA) 

This  definition  is  not  a  rule,  but  a  template  for  other  definitions.  To  define  a  refine¬ 
ment,  one  has  to  provide  the  particular  Tc,  Ttp,  Acc  together  with  the  proof  that  the  rule 
holds.  However,  instead  of  trying  to  define  these  translations  directly,  we  will  automat¬ 
ically  generate  them  from  the  relations  between  the  particular  pairs  of  machines. 

Representation  Refinement  The  only  automatic  refinement  we  will  discuss  in  this 
paper  is  the  representation  refinement.  The  representation  refinement  can  be  generated 
for  an  abstract  (Ma)  and  a  concrete  machine  (Me),  where  both  use  the  same  operations 
and  condtionals  (e.g.  Ma-^  -  Mc-^  and  Ma-P  —  Mc-P)  by  defining  a  relation  (repr  : 
Ma-^  — >  Mc-^  — >  Prop)  between  the  states  of  the  two  machines.  Using  repr,  we  can 
define  our  specification  translation  function: 


T  (  A  (^^C-3SA.repr  Sa  Sc^pSa, 

r  A-Ctp,gi  -  dSc.dS'c.VSA.repr  Sa  Sc  ^  VS'^.g  Sa  S'a  ^  repr  S'a  S'c) 

This  operation  creates  an  concrete  action  from  an  abstract  action.  Informally  it 
works  as  follows.  There  must  be  at  least  one  abstract  state  related  to  the  starting  con¬ 
crete  state  for  which  the  abstract  action  applies.  The  action  starting  from  state  Sc  results 
in  set  containing  S'c.  only  if  for  all  related  abstract  states  for  which  the  abstract  action 
is  valid  result  in  sets  of  abstract  states  that  contain  a  state  related  to  S'c-  Essentially,  the 
resulting  concrete  action  is  an  intersection  of  all  abstract  actions  that  do  not  fail. 

To  make  this  approach  work,  we  require  several  properties  over  the  machines  and 
the  repr.  First,  the  refined  semantics  of  abstraction  operations  have  to  be  weaker  than 
the  semantics  of  their  concrete  counterparts,  e.g.  Via  e  MA,TA-c(MA(tA))  2  Mc(la)- 

Second,  the  refinement  must  preserve  the  branch  choice,  e.g.  if  the  refined  program 
chooses  left  branch,  then  abstract  program  had  to  choose  the  left  branch  in  all  states 
related  by  repr  as  well.  This  property  is  ensured  by  requiring  the  following: 

Vfi.VS,S'.  (3Sc.repr(S,Sc)  Arepr(S',Sc))  ^  (M(b)  S  ^  M(b)  S') 


With  these  properties,  we  can  define  a  valid  refinement  by  the  following  lemma: 

Lemma  1  (repr-reflnement  valid). 

Given  repr  with  proofs  of  the  two  properties  above,  the  following  is  valid: 

Ma.Xa  ^C:Fa 
Mc,TH£a)^C:Tp(Fa) 

where  Tp(F)  ■- {Ta-c(F(1))  |  1  6  domCf')) 
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This  refinement  is  interesting  in  that  it  preserves  the  code  of  the  program,  and  per¬ 
forming  point-wise  refinement  on  specifications.  Our  actual  work  defines  several  other 
refinement  generators.  One  of  these,  code-preserving  refinement,  is  included  in  the  TR, 
and  is  used  as  a  stepping  stone  for  proof  of  Lemma  1 .  Coq  implementation  features 
more  general  versions  of  refinements  presented,  as  well  as  several  others. 

4  Certifying  C  Code 

Since  BabyVMM  is  written  in  C,  we  define  a  formal  specification  of  a  tiny  subset  of 
the  C  language  using  our  framework.  This  C  machine  will  be  parameterized  by  the 
specific  semantics  of  the  memory  model,  as  our  plan  required.  We  will  also  utilize  the 
C  machine  to  further  speed  up  the  creation  of  refinements. 

4.1  The  Semantics  of  C 

To  define  our  C  machine  in  terms  of  our  verification  framework,  we  need  to  give  it 
a  state  type,  a  list  of  operations,  and  the  semantics  of  those  operations  expressed  as 
actions.  All  of  these  are  given  in  Figure  8. 

The  state  of  the  C  machine  includes  two  components,  the  stack  and  the  memory. 
The  stack  is  an  abstract  C  stack  that  consists  of  a  list  of  frames,  which  include  call, 
data,  and  return  frames.  In  the  current  version,  the  stack  is  independent  from  memory 
(one  can  think  of  it  existing  within  a  statically  defined  part  of  the  loaded  kernel).  The 
memory  model  is  a  parameter  in  the  C  machine,  meaning  that  it  can  make  use  of  any 
memory  model  as  long  as  it  defines  load  and  store  operations.  The  syntax  of  the  C 
machine  is  different  from  the  usual  definition,  in  that  it  relies  on  the  meta-machine  for 
its  control  flow  by  using  the  meta-machines  call  and  branch.  Our  definition  of  C  adds 
atomic  operations  that  perform  state  updates.  Thus  the  operations  include  two  types 
assignments  -  one  to  stack  and  one  to  memory,  and  4  operations  to  manipulate  stack  for 
call  and  return,  which  push  and  pop  the  frames. 

Because  control  flow  is  provided  by  a  standard  machine,  the  code  has  to  be  modi¬ 
fied  slightly.  For  example,  a  function  call  of  the  form  r  =  f{x)  will  split  into  a  sequence 
of  three  operations:  fcall{{x}y,{f\,readret{{r\),  the  first  setting  up  a  call  frame,  the 
second  making  the  call,  and  the  third  doing  the  cleanup.  Similarly,  the  body  of  the  func¬ 
tion  f{x){body,  return{0y, )  will  become  args([x]y  body,  ret(Q),  as  the  function  must  first 
move  the  arguments  from  the  call  frame  into  a  data  frame.  Loops  have  to  be  desugared 
into  recursive  procedures  with  branches.  These  modifications  are  entirely  mechanical, 
and  hence  we  can  claim  that  our  machine  supports  desugared  linearized  C  code. 

4.2  Refinement  in  C  machines 

C  machines  at  different  abstraction  layers  differ  only  in  their  memory  models,  with 
the  stack  being  the  same.  We  can  use  this  fact  to  generate  refinements  between  the  C 
machines  using  only  the  representation  relation  between  memory  models.  This  relation 
(Ml  <  M2)  can  be  completely  arbitrary  as  long  as  these  conditions  hold: 

V/,  V.  load{M\ , /)  =  V  — >  load{M2, 1)  =  v 

'il,v,My  (mJ  =  {store(M\,l,v))j  — >  (mJ  <  (store(M2,l,v))^ 
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(State)  S 
(Memory)  M 
(Stack)  S 
(Expressions)  e 
(StackExpr,  Cond)  se,  b 
(Binary  Operators)  bop 
(Variables)  v 
(Words)  w 
(Operation)  i 


=  (M,S) 

=  (any  type  over  which  load(M,l)  and  store(M,l,w)  are  defined) 
-  nil  I  Call(list  w) ::  5  |  Data({v  w)  ::  5)  |  Ret(w) ::  S 
=  se  I  *(e) 

=  w  I  V  I  binop(bop,ei,e2) 

=  +  I  -  I  *  I  /  I  %  I  ==  I  <  I  <=  I  >=  I  >  I  !  =  I  &&  I  II 

=  (a  decidable  set  of  names) 

=  n  (integers) 

=  V  :=  e  I  *(eioc)  '■=  e  I  fcall(list  e)  |  ret(e)  |  args(list  v)  |  readret(v) 


Operation  (()  = 

Action  (M(i))  - 

V  :=  e 

(AB.3S' ,F,w.B.S  =  Data(F) ::  S'  A  eval(e,B)  =  w, 

AS,B' .3S' ,F,w.B.S  =  Data(F) ::  S'  A  eval(e,B)  =  wA 

S'.M  =  B.MaB'.S  =  Data(F{v  w]) ::  S') 

*(e;oe)  e 

(AB.3l,w.eval(e,S)  =  w  Aeval(ei„c,S)  =  lA3M'  .M'  =  store(M,l,w), 
AB,B'  .3l,w.eval(e,B)  =  w  Aeval(eioc,B)  =  lA 

S'.M  =  store(B.M,  l,w)  A  S'.5  =  S.S) 

fcall([ei,...,e„]) 

(/lS.3vi,...,v„.eva/(ei,S)  =  vi  A  . . .  A  eval(e„,B)  =  v„, 
/lS,S'.3vi,...,v„.eva/(ei,S)  =  vi  A . . .  A  eval(e„,B)  =  v„A 

S'.M  =  S.MAB'.S  =  Call([vi,...,v„])::B.S) 

args([vi,...,v„]) 

(/lS.3wi,...,w„,S'.S.S  =  Ca//([wi,...,H'„]) ::  S', 
/lS,S'.3wi,...,w„,S'.S.5  =  Cfl//([wi,...,H'„]) ::  S' A 

S'.M  =  S.M  aB'.S  =  Data({vi  wi,. .  w„)) ::  S') 

readret(v) 

(/1S.3S',h’. S.5  =  Ret(w) ::  Data(D)  ::S', 

3S,S'.3S',h’.S.5  =  Ret(w) ::  Data(D) ::  S'  A 

S'.M  =  S.M  aB'.S  =  Data(D{v  wj) ::  S') 

ret(e) 

(3S.3w.evfl/(e,S)  =  w,  AS,B' .S' .M  =  S.M aB'.S  =  Ret(eval(e,S)) ::  S.S) 

w  if  e  =  >v 

S.S(v)  if  e  =  V 

load(B.M,eval(ei,S))  ife  =  (*ei) 

b(eval(e  i ,  S),  eval(e2,  S))  if  e  =  binop(b,  e  i ,  02) 

AB.eval(b,S)  ^  0 


eval(e,§)  = 

Y(b) 


Fig.  8.  Primitive  C-like  machine 

The  above  properties  make  sure  that  the  load  and  store  operations  of  memory  behave 
in  a  similar  way.  We  construct  the  repr  between  C  machine  as  follows: 

repr  /lS/i,Sc-  (Sa-S  -  Sc-S)  A  (Ba-M  <  Bc-M) 


Using  the  properties  of  load  and  store,  we  show  properties  needed  for  repr-refinement 
to  work:  that  for  every  operation  i  in  the  C  machine  2  Mmiit),  and 

that  repr  preserves  branching.  For  details,  please  see  the  TR.  Now  we  can  define  the 
actual  refinement  rule  for  C  machines: 

Corollary  1  (C  Refinement). 

For  any  two  memory  models  Ml  and  M2,  s.t.  Ml  <  M2,  the  following  refinement  works  for  C 
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Definition 

Value 

Description 

PGSIZE 

NPAGES 

VPAGES 

Pg(addr) 

Off{addr) 

LowPg(pg) 

HighPg(pg) 

4096 

unspecified 

unspecified 

addrJPGSlZE 

addr%PGSlZE 

0<pg<  NPAGES 
NPAGES  <pg<  VPAGES 

Number  of  bytes  per  page 

Number  of  phys.  pages  in  memory 

Maximum  page  number  of  a  virtual  address 

gets  page  of  address 

offset  into  page  of  address 

valid  page  in  low  memory  area 

valid  page  in  high  memory  area 

Fig.  9.  Page  Definitions 


machines  instantiated  with  Ml  and  M2. 

_ W _ 

Thus  we  know  that  if  we  have  two  C-machines  that  have  related  memory  models, 
then  we  have  a  working  refinement  between  the  two  machines.  Our  next  step  is  the  to 
show  the  relations  between  all  the  memory  models  shown  in  our  plan  (in  Figure  3). 

5  Virtual  Memory  Manager 

At  this  point,  we  have  all  the  machinery  necessary  to  start  building  our  certified  memory 
manager  according  to  the  plan.  The  first  step  is  to  formally  define  and  give  relations 
between  the  memory  models  that  we  will  use  in  our  certification.  Then  we  will  certify 
the  code  of  the  modules  that  make  up  the  VMM.  These  modules  will  then  be  refined 
and  linked  together,  resulting  in  the  conclusion  that  the  entire  BabyVMM  is  certified. 

5.1  The  Memory  Models 

Because  of  the  space  limit,  we  will  only  formally  present  the  PMAP  memory  model 
(Figures  9  and  10).  For  the  definitions  of  others,  please  see  the  TR. 

The  state  of  the  PMAP  memory  has  three  components,  the  actual  memory  store  D, 
the  allocation  table  A,  and  the  first-class  pagemap  PM.  The  memory  store  contains  the 
actual  data  in  memory,  indexed  by  physical  addresses.  The  allocation  table  A,  keeps 
track  of  which  pages  are  allocated  and  which  are  not.  This  allocation  information  is 
abstract  -  it  does  not  have  to  correspond  to  the  actual  allocation  table  used  within  the 
VMM.  For  example,  the  hardware  page  tables,  which  this  model  abstracts,  are  still  in 
memory,  but  are  hidden  by  the  allocation  table.  The  page  map  is  the  abstract  mapping 
of  virtual  pages  to  physical  pages,  which  purposefully  skips  all  addresses  mappable  to 
physical  memory.  This  mapping  is  used  in  loads  and  stores  of  the  memory  model,  which 
use  the  trans  predicate  to  translate  addresses  by  looking  up  mappings  in  the  PM. 

The  PMAP  model  relies  on  the  stub  library  (X.pmap)  for  updating  auxiliary  data 
structures.  There  are  two  stubs  for  memory  allocation,  mem_alloc  and  mem_f  ree.  Their 
specs  show  how  they  modify  the  allocation  table,  and  how  allocating  a  page  is  non- 
deterministic  and  may  potentially  return  any  free  page.  The  other  two  stubs,  pt_set 
and  pt_lookup  update  and  look  up  page  map  entries;  their  specs  are  straightforward. 
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(Global  Storage  System)  M  ::=  (D,A,PM) 

(Allocatable  Memory)  D  ::=  [addr w  \howPg(Pg{addr))  Aaddr%S  =  0]* 
(Page  Allocation  Table)  A  ::=  [pg'^  bool  |  LowPg(pg))* 

(Page  Map)  PM  ::=  [pg  pg'  \  HighPg(pg))* 


Notation 

Definition  | 

load(M,  va) 

M.D(trans(M,  va)) 

if  M.A(Pg(trans(M,va)))  =  true 

store(M,va,w) 

(M.D[trans(M,va)  w],M.A,M.PM) 

if  M.A(Pg(trans(M,  vfl)))  =  true 

M.PM(Pg(va))  *  PGSIZE  +  Off(va)  if  HighPg(Pg(va)) 

trans(M,va)  :  =  < 

va  otherwise 


Label 

Specibcation 

mem_alloc 

(AS.3S' S.S  =  Call(B) ::  S' , 

4S,S'.35'.  (S.S  =  Call([]) ::  S')A((S'.S  =  Ret(0) ::  S' AS'. M  =  S.M)V 
(3pg.S'.S  =  Ret(pg) ::  S'  A  S'.M.A  =  S.M.A{pg true]  A  S'.M.PM  =  S.M.PMA 
AS.M.A(pg)  =  false  ATI. S.M.A(Pg(l))  =  true  (S'.M.D(l)  =  S.M.D(l)))) 

mem  Tree 

(AS.3S'  ,pg.S.S  -  Call([pg]) ::  S'  AS. M.A(pg)  =  true, 

AS,  S'.3S',pg.  S.S  =  Call([pg]) ::  5'  A  S'.S  =  Ret(O) ::  S' A  S'.M.PM  =  S.M.PMA 
S'.M.A  =  S.M.A{pg~^  false]  AVI. S'. M.A(Pg(l))  =  true  S'.M.D(l)  =  S.M.D(l)) 

pt_set 

(AS..3S',vp,pp.S.S  =  Call([vp,pp]) ::  5' A HigbPg(vp)  A LowPg(pp) 

AS,S'  .3S'  ,vp,pp.S.S  =  Call([vp,pp]) ::  S'  AS'  .S  =  Ret(0) ::  S'  A  S'.M.A  =  S.M.AA 
S'.M.PM  =  S.M.PM{vp  pp]  AVIS' .M.A(Pg(l))  =  true  S'.M.D(l)  =  S.M.D(l)) 

ptJookup 

(AS.3S' ,vp.S.S  =  Call([vp]) ::  5' A HigbPg(vp), 

AS,S' .3S' ,vp.S.S  =  Call([vp]) ::  S' AS'. S  =  Ret(S.M.PM(vp)) ::  S' A  S'.M  =  S.M) 

Fig.  10.  PMAP  Memory  Model  (Mpmap)  and  Library  (JIpmap) 

5.2  Relation  between  Memory  Models 

Our  plan  calls  for  creation  of  the  refinements  between  the  memory  models.  In  Sec¬ 
tion  4.2,  we  have  shown  that  we  can  generate  a  valid  refinement  by  creating  a  relation 
between  the  memory  states,  and  then  showing  that  abstract  loads  and  stores  are  pre¬ 
served  by  this  relation.  These  relations  and  proofs  of  preserving  the  memory  operations 
are  fairly  lengthy  and  quite  technical,  and  thus  we  leave  the  mathematical  detail  to  our 
Coq  implementation,  opting  for  a  visual  description  shown  in  Figure  1 1 . 

On  the  right  is  a  state  of  the  hardware  memory,  whose  operational  semantics  gives 
little  protection  from  accessing  data.  Some  areas  of  memory  are  dangerous,  some  are 
empty,  others  contain  data,  including  the  allocation  tables  and  page  tables.  This  memory 
relates  to  the  ALE  memory  model  by  abstracting  out  the  memory  allocation  table.  This 
allocation  table  now  offers  protection  for  accessing  both  the  unallocated  space,  and  the 
space  that  seems  unallocated,  but  dangerous  to  use  (marked  by  wavy  lines).  An  example 
of  such  area  is  the  allocation  table  itself  -  the  ALE  model  hides  the  table,  making  it 
appear  to  be  unusable.  The  ALE  mem.alloc  primitive  will  never  allocate  pages  from 
these  wavy  areas,  protecting  them  without  complicating  the  memory  model. 

The  relation  between  the  PMAP  and  ALE  models  shows  that  the  abstract  pagemap 
of  PMAP  model  is  actually  contained  within  the  specific  area  of  the  ALE  model.  The 
relation  makes  sure  that  the  mappings  contained  in  the  PMAP’s  pagemap  are  the  same 
as  the  translation  results  of  the  ALE’s  page  table  structures.  To  protect  the  in  memory 
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VPAGES 


AS 


PMAP 


ALE 


PE  /  HW 


High  Addresses: 
only  accessible  as 
virtual  pages 


NPAGES 


Allocatable  Space 

0x200 

PageTabies„^jg(, 

Memory  Allocation  Table 

0x150 

Kernel  Code 

0x100 

Reserved  for  Hardware 

OxOAO 


Allocatable  Space 


n  ^  f  LJ  ^  0x001 

Reserved  for  Hardwa^^^QQ 


Legend  ^ 

~l  Data  present 

Allocated  w/data 

1  ' — ■  ' — ■  _i  Appears  free  hut  iinavaliahle 

1  1  1  1  1  1  1  1  1  1  1  1  Paqp  fable  data 

1  1  Free  spare 

Dangerous 

3  1  1 II  Memory  allocation  table  data 

J 

Fig.  11.  Relation  between  Memory  Models 

page  tables,  the  relation  hides  the  page  table  memory  area  from  the  PMAP  model,  using 
the  same  trick  as  the  one  used  to  protect  the  allocation  tables  in  the  ALE  model. 

The  relation  between  the  AS  and  PMAP  models  collapses  PMAP’s  memory  and  the 
page  maps  into  a  single  memory  like  structure  in  the  AS  model.  This  is  mostly  accom¬ 
plished  by  chaining  the  translation  mechanism  with  the  storage  mechanism.  However, 
to  make  this  work,  it  is  imperative  that  the  relation  ensures  that  no  two  pages  of  the 
AS  model  ever  map  to  the  same  physical  page  in  the  PMAP  model.  This  means  that 
all  physical  pages  that  are  mapped  from  the  high-addresses  become  hidden  in  the  AS 
model.  We  will  not  go  into  detail  about  the  preservation  of  load  and  stores,  as  these 
proofs  are  mostly  straightforward,  given  the  relations. 

5.3  Certification  and  Linking  of  Baby  VMM 

We  have  verified  all  the  functions  of  the  virtual  memory  on  the  appropriate  memory 
models.  This  means  that  we  have  defined  appropriate  specifications  for  our  functions, 
and  certified  our  code.  We  also  make  an  assumption  that  a  kernel  is  certified  in  the  AS 
model.  The  result  is  the  following  certified  modules: 

MpeXpe  h  MaleXpmap  h  C“  :  MpdXpd  k  ■  'A™™'"'' 

MaleXale  h  CA  :  :  F*--'  Mald, £ald  h 

However,  the  init  function  makes  calls  to  other  procedures  that  are  certified  in 
more  abstract  machines.  Thus  to  certify  init  over  the  Mhw  machine,  we  will  need  to 
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create  stubs  for  these  procedures,  which  have  to  be  carefully  crafted  to  be  valid  for  the 
refined  specifications  of  the  actual  procedures.  Thus,  the  specification  of  init  results 
in  the  following: 

Mw, -Chit  U  {kernel Jnit'v.  mem  init^s.^  ptJnit a^'™'}  I-  C”''  : 

With  all  the  modules  verified,  we  proceed  to  link  them  together.  The  first  step  is  to 
refine  the  kernel.  We  use  our  AS-PMAP  refinement  rule  to  get  the  refined  module: 

Mpmap,Tas-pmap(-Cas)  :  Tas-pmap('1^as''‘^‘^ 

Then  we  show  that  the  specs  of  functions  and  the  primitives  of  the  PMAP  machine  are 
proper  implementation  of  the  refined  specs  of  Xas,  more  formally,  Tas-pmap{-Cas)  2 
-CpMAP'^  'PpMAP-  Using  library  strengthening  and  the  linking  lemma,  we  produce  a  cer¬ 
tified  module  that  is  the  union  of  the  refined  kernel  and  address  space  library: 

\ /t  C  I  /r^kcrncl  <  <  (C^CIS  ,  nn  /  iT/k6rtl£l\  I  I  \TfClS 

M.PMAP,J-PMAP^^  UC  ■  1  AS-PMAPK^aS  f^^PMAP 

Applying  this  process  to  all  the  modules  over  all  refinements,  we  link  all  parts  of  the 
code,  except  init  certified  over  Mhw-  For  readability,  we  hide  chains  of  refinements. 
For  example,  Tas-hw  is  actually  Tas-pmap  °  Tpmap-ale  °  Tale-pe  °  Tpe-hw- 

Mhw,£hw  h  U  C"'*  U  CP‘  U  C”™  U  C'"™'"''  U  : 

TaS-Hw('£as"‘^‘^  T PMAP-Hw{'Pp]^AP^  ^ALE-HwCPaLE^  ^ 

Tpe-hw{'P’P^e")  U  Tpd-hw{'PTd"''"')  ^  Tald-hw^'P^aw  ) 

To  get  the  initialization  to  link  with  the  refined  module,  we  must  make  sure  that  the 
stubs  that  we  have  developed  for  init  are  compatible  with  the  refined  specifications  of 
the  actual  functions.  This  means  that  we  prove  the  following: 

^kernel-ini,  3  rA5_^^(¥/fer-/)(kernel-init) 

D  ^  TALD-HwiP^Aw'^iV^-^rrl^'^ 

Using  these  properties,  we  apply  stub  strengthening  to  the  init  module: 
Mhw,J:hw'^Tas-hw{P%^''‘^‘)  ^TpD-Hw{P'^r"“) 

This  certification  is  now  linkable  to  the  rest  of  the  VMM  and  kernel,  to  produce  the 
final  result  that  we  need: 

Mhw,£hw  h  u  u  CP‘  u  u  u  C'’”'"''  u  C'"''  : 

TaS-Hw(Pas"‘^'^  P PMAP-Hw(Pp]^AP^  PaLE-Hw(PaLE^ 

TpE-Hw(P'Pr^  U  Tpd-Hw{PTd"''"‘^  U  TALD-HwiP^To^  U 

This  result  means  that  given  a  certified  kernel  in  the  AS  model,  we  can  refine  it  to 
the  HW  model  of  memory  by  linking  it  with  VMM  implementation.  Furthermore,  it  is 
safe  to  start  this  kernel  by  calling  the  init  function,  which  will  perform  the  setup,  and 
then  call  the  kernel-init  function,  the  entry  point  of  the  high-level  kernel. 


208 


6  Coq  Implementation 


All  portions  of  this  system  have  been  implemented  in  the  Coq  Proof  Assistant[5].  The 
portions  of  the  implementation  directly  related  to  the  BabyVMM  verification,  including 
C  machines,  refinements,  specs,  and  related  proofs  (excluding  frameworks)  took  about 
3  person-months  to  verify.  The  approximate  line  counts  for  unoptimized  proof  are: 

-  Verification  and  refinement  framework  -  3000  lines 

-  Memory  models  -  200-400  lines  each 

-  repr  and  compatibility  between  models  -  200-400  lines  each 

-  Compatibility  of  stubs  and  implementation  -  200-400  lines  per  procedure 

-  Code  verification  -  less  than  200  lines  per  procedure  (half  of  it  boilerplate). 


7  Related  Work  and  Conclusion 


The  work  presented  here  is  a  continuation  of  the  work  on  Hoare-logic  frameworks  for 
verification  of  system  software.  The  verification  framework  evolved  from  SCAP[8]  and 
GCAP[3].  Although  our  framework  does  not  mention  separation  logic[17],  information 
hiding  [16],  and  local  action[4]  explicitly,  these  methods  had  great  influence  on  the  de¬ 
sign  of  the  meta-language  and  the  refinements.  The  definition  of  repr  generalizes  the 
work  on  certified  garbage  collector[15]  to  fit  our  concept  of  refinement.  The  project’s 
motivation  is  the  modular  and  reusable  certification  of  the  CertiKOS  kernel[10]. 

The  well-known  work  in  OS  verification  is  L4.verified[12, 6],  which  has  shown  a 
complete  verification  of  an  OS  kernel.  Their  methodology  is  different,  but  they  have 
considered  verification  of  virtual  memory[13, 14].  However,  their  current  kernel  verifi¬ 
cation  does  not  abstract  virtual  memory,  maintaining  only  the  invariant  that  allows  the 
kernel  to  function,  and  leaving  the  details  to  the  user  level. 

The  Verisoft  project  [9, 2, 1, 11, 18]  is  the  work  that  is  closest  to  ours.  We  both  aim 
for  pervasive  verification  of  OS  by  doing  foundational  verification  of  all  components. 
Both  works  utilize  multiple  machines,  and  require  linking.  As  both  projects  aim  for 
certification  of  a  kernel,  both  have  to  handle  virtual  memory.  Although  Verisoft  uses 
multiple  machine  models,  they  use  them  sparingly.  For  example,  the  entire  microkernel, 
excluding  assembly  code,  is  specified  in  a  single  layer,  with  correctness  shown  as  a 
single  simulation  theorem  between  the  concurrent  user  thread  model  (CVM)  and  the 
instruction  set.  The  authors  mention  that  the  proof  of  correctness  is  a  more  complex 
part  of  Verisoft.  Such  monolithic  approach  is  susceptible  to  local  modifications,  where 
a  small  change  in  one  part  of  microkernel  may  require  changes  to  the  entire  proof. 

Our  method  for  verification  defines  many  more  layers,  with  smaller  refinement 
proofs  between  them,  and  composes  them  to  produce  larger  abstractions,  ensuring  that 
the  verification  is  more  reusable  and  modular.  Our  new  framework  enables  us  to  create 
abstraction  layers  with  less  overhead,  reducing  the  biggest  obstacle  to  our  approach.  We 
have  demonstrated  the  practicality  of  our  approach  by  certifying  BabyVMM,  a  small 
virtual  memory  manager  running  on  simplified  hardware,  using  a  new  layer  for  every 
non-trivial  abstraction  we  could  find. 
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Abstract — This  article  describes  a  novel  quantitative  proof 
technique  for  the  modular  and  local  verification  of  lock-freedom. 
In  contrast  to  proofs  based  on  temporal  rely-guarantee  require¬ 
ments,  this  new  quantitative  reasoning  method  can  be  directly 
integrated  in  modem  program  logics  that  are  designed  for  the 
verification  of  safety  properties.  Using  a  single  formalism  for 
verifying  memory  safety  and  lock-freedom  allows  a  combined 
correctness  proof  that  verifies  both  properties  simultaneously. 

This  article  presents  one  possible  formalization  of  this  quan¬ 
titative  proof  technique  by  developing  a  variant  of  concurrent 
separation  logic  (CSL)  for  total  correctness.  To  enable  quantita¬ 
tive  reasoning,  CSL  is  extended  with  a  predicate  for  affine  tokens 
to  account  for,  and  provide  an  upper  bound  on  the  number  of 
loop  iterations  in  a  program.  Lock-freedom  is  then  reduced  to 
total-correctness  proofs.  Quantitative  reasoning  is  demonstrated 
in  detail,  both  informally  and  formally,  by  verifying  the  lock- 
freedom  of  Treiber’s  non-blocking  stack.  Furthermore,  it  is  shown 
how  the  technique  is  used  to  verify  the  lock-freedom  of  more 
advanced  shared-memory  data  structures  that  use  elimination- 
backoff  schemes  and  hazard-pointers. 

I.  Introduction 

The  efficient  use  of  multicore  and  multiprocessor  systems 
requires  high  performance  shared-memory  data  structures. 
Performance  issues  with  traditional  lock-based  synchronization 
has  generated  increasing  interest  in  non-blocking  shared- 
memory  data  structures.  In  many  scenarios,  non-blocking  data 
structures  outperform  their  lock-based  counterparts  [1],  [2]. 
However,  their  optimistic  approach  to  concurrency  complicates 
reasoning  about  their  correctness. 

A  non-blocking  data  structure  should  guarantee  that  any 
sequence  of  concurrent  operations  that  modify  or  access  the 
data  structure  do  so  in  a  consistent  way.  Such  a  guarantee  is  a 
safety  property  which  is  implied  by  linearizability  [3].  Addi¬ 
tionally,  a  non-blocking  data  stmcture  should  guarantee  certain 
liveness  properties,  which  ensure  that  desired  events  eventually 
occur  when  the  program  is  executed,  independent  of  thread 
contention  or  the  whims  of  the  scheduler.  These  properties  are 
ensured  by  progress  conditions  such  as  obstruction-freedom, 
lock-freedom,  and  wait-freedom  [4],  [5]  (see  §11).  In  general, 
it  is  easier  to  implement  the  data  structure  efficiently  if  the 
progress  guarantees  it  makes  are  weaker.  Lock-freedom  has 
proven  to  be  a  sweet  spot  that  provides  a  strong  progress 
guarantee  and  allows  for  elegant  and  efficient  implementations 
in  practice  [6],  [7],  [8],  [9]. 

The  formal  verification  of  practical  lock-free  data  stmctures 
is  an  interesting  problem  because  of  their  relevance  and  the 
challenges  they  bear  for  current  verification  techniques:  They 
employ  fine-grained  concurrency,  shared-memory  pointer-based 
data  structures,  pointer  manipulation,  and  control  flow  that 
depends  on  shared  state. 


Classically,  verification  of  lock-freedom  is  reduced  to  model¬ 
checking  liveness  properties  on  whole-program  execution 
traces  [10],  [11],  [12].  Recently,  Gotsman  et  al.  [13]  have 
argued  that  lock-freedom  can  be  reduced  to  modular,  thread- 
local  termination  proofs  of  concurrent  programs  in  which 
each  thread  only  executes  a  single  data-structure  operation. 
Termination  is  then  proven  using  a  combination  of  concurrent 
separation  logic  (CSL)  [14]  and  temporal  trace-based  rely- 
guarantee  reasoning.  In  this  way,  proving  lock-freedom  is 
reduced  to  a  finite  number  of  termination  proofs  which  can  be 
automatically  found.  However,  as  we  show  in  §11,  this  method 
is  not  intended  to  be  applied  to  some  lock-free  data  structures 
that  are  used  in  practice. 

These  temporal-logic  based  proofs  of  lock-freedom  are  quite 
different  from  informal  lock-freedom  proofs  of  shared  data 
stmctures  in  the  systems  literature  (e.g.,  [7],  [9]).  The  informal 
argument  is  that  the  failure  to  make  progress  by  a  thread  is 
always  caused  by  successful  progress  in  an  operation  executed 
by  another  thread.  In  this  article,  we  show  that  this  intuitive 
reasoning  can  be  turned  into  a  formal  proof  of  lock-freedom. 
To  this  end,  we  introduce  a  quantitative  compensation  scheme 
in  which  a  thread  that  successfully  makes  progress  in  an 
operation  has  to  logically  provide  resources  to  other  threads 
to  compensate  for  possible  interference  it  may  have  caused. 
Proving  that  all  operations  of  a  data  structure  adhere  to  such 
a  compensation  scheme  is  a  safety  property  which  can  be 
formalized  using  minor  extensions  of  modern  program  logics 
for  fine-grained  concurrent  programs  [14],  [15],  [16],  [17]. 

We  formalize  one  such  extension  in  this  article  using 
CSL.  We  chose  CSL  because  it  has  a  relatively  simple  meta¬ 
theory  and  can  elegantly  deal  with  many  challenges  arising  in 
the  verification  of  concurrent,  pointer-manipulating  programs. 
Parkinson  et  al.  [18]  have  shown  that  CSL  can  be  used  to 
derive  modular  and  local  safety  proofs  of  non-blocking  data 
structures.  The  key  to  these  proofs  is  the  identification  of  a 
global  resource  invariant  on  the  shared-data  structure  that  is 
maintained  by  each  atomic  command.  However,  this  technique 
only  applies  to  safety  properties  and  the  authors  state  that  they 
are  “investigating  adding  liveness  rules  to  separation  logic  to 
capture  properties  such  as  obstmction/lock/wait-freedom”. 

We  show  that  it  is  not  necessary  to  add  “liveness  rules”  to 
CSL  to  verify  lock-freedom.  As  in  Atkey’s  separation  logic  for 
quantitative  reasoning  [19]  we  extend  CSL  with  a  predicate 
for  affine  tokens  to  account  for,  and  provide  an  upper  bound 
on  the  number  of  loop  iterations  in  a  program.  In  this  way, 
we  obtain  the  first  separation  logic  for  total  correctness  of 
concurrent  programs. 
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Strengthening  the  result  of  Gotsman  et  al.  [13],  we  first 
show  that  lock-freedom  can  be  reduced  to  the  total  correctness 
of  concurrent  programs  in  which  each  thread  executes  a  finite 
number  of  data-structure  operations.  We  then  prove  the  total 
correctness  of  these  programs  using  our  new  quantitative 
reasoning  technique  and  a  quantitative  resource  invariant  in 
the  sense  of  CSL.  Thus  the  proof  of  the  liveness  property  of 
being  lock-free  is  reduced  to  the  proof  of  a  stronger  safety 
property.  The  resulting  proofs  are  simple  extensions  of  memory- 
safety  proofs  in  CSL  and  only  use  standard  techniques  such 
as  auxiliary  variables  [20]  and  read  permissions  [21]. 

We  demonstrate  the  practicality  of  our  compensation-based 
quantitative  method  by  verifying  the  lock-freedom  of  Treiber’s 
non-blocking  stack  (§VI).  We  further  show  that  the  technique 
applies  to  many  lock-free  data  structures  by  discussing  the 
verification  of  more  complex  shared-memory  data  structures 
such  as  Michael  and  Scott’s  non-blocking  queue  [7],  Hendler 
et  al.’s  non-blocking  stack  with  elimination  backoff  [9],  and 
Michael’s  non-blocking  hazard-pointer  data  structures  [8] 
(§VII). 

Our  method  is  a  clean  and  intuitive  modular  verification 
technique  that  works  correctly  for  shared-memory  data  struc¬ 
tures  that  have  access  to  thread  IDs  or  the  total  number  of 
threads  in  the  system  (see  §II  for  details).  It  can  not  only  be 
applied  to  verify  total  correctness  but  also  to  directly  prove 
liveness  properties  or  to  verify  termination-sensitive  contextual 
refinement.  Automation  of  proofs  in  concurrent  separation  logic 
is  an  orthogonal  issue  which  is  out  of  the  scope  of  this  paper. 
It  would  require  the  automatic  generation  of  loop  invariants 
and  resource  invariants.  Assuming  that  they  are  in  place,  the 
automation  of  the  proofs  can  rely  on  backward  reasoning  and 
linear  programming  as  described  by  Atkey  [19]. 

In  summary,  we  make  the  following  contributions. 

1)  We  introduce  a  new  compensation-based  quantitative 
reasoning  technique  for  proving  lock-freedom  of  non- 
blocking  data  structures.  (§III  and  §V) 

2)  We  formalize  our  technique  using  an  novel  extension  of 
CSL  for  total  correctness  and  prove  the  soundness  of 
this  logic.  (§IV,  §V,  and  §VI) 

3)  We  demonstrate  the  effectiveness  of  our  approach  by 
verifying  the  lock-freedom  of  Treiber’s  non-blocking 
stack  (§VI),  Michael  and  Scott’s  lock-free  queue,  Hendler 
et  al.’s  lock-free  stack  with  elimination  backoff,  and 
Michael’s  lock-free  hazard-pointer  stack. 

In  §VII,  we  discuss  how  quantitative  reasoning  can  verify 
the  lock-freedom  of  data  structures  such  as  maps  and  sets, 
that  contain  loops  that  depend  on  the  size  of  data  structures. 
Finally,  in  §IX,  we  discuss  other  possible  applications  of 
quantitative  reasoning  for  proving  liveness  properties  including 
wait-freedom  and  starvation-freedom.  Appendix  II  of  this  article 
contains  all  rules  of  the  logic,  the  semantics,  and  the  full 
soundness  proof. 

IT  Non-Blocking  Synchronization 

Recent  years  have  seen  increasing  interest  in  non-blocking 
data  structures  [1],  [2]:  shared-memory  data  structures  that 


provide  operations  that  are  synchronized  without  using  locks 
and  mutual  exclusion  in  favor  of  performance.  A  non-blocking 
data  stmcture  is  often  considered  to  be  correct  if  its  operations 
are  linearizable  [3].  Alternatively,  correctness  can  be  ensured 
by  an  invariant  that  is  maintained  by  each  instruction  of  the 
operations.  Such  an  invariant  is  a  safety  property  that  can 
be  proved  by  modern  separation  logics  for  reasoning  about 
concurrent  programs  [18]. 

Progress  Properties:  In  this  article,  we  focus  on  com¬ 
plementary  liveness  properties  that  guarantee  the  progress  of 
the  operations  of  the  data  structure.  There  are  three  different 
progress  properties  for  non-blocking  data  structures  considered 
in  literature.  To  define  these,  assume  there  is  a  fixed  but  arbitrary 
number  of  threads  that  are  (repeatedly)  accessing  a  shared- 
memory  data  stmcture  exclusively  via  the  operations  it  provides. 
Choose  now  a  point  in  the  execution  in  which  one  or  more 
operations  has  started. 

•  A  wait-free  implementation  guarantees  that  every  thread 
can  complete  any  started  operation  of  the  data  structure 
in  a  finite  number  of  steps  [4]. 

•  A  lock-free  implementation  guarantees  that  some  thread 
will  complete  an  operation  in  a  finite  number  of  steps  [4]. 

•  An  obstruction-free  implementation  guarantees  progress 
for  any  thread  that  eventually  executes  in  isolation  [5] 
(i.e.,  without  other  active  threads  in  the  system). 

Note  that  these  definitions  do  not  make  any  assumptions  on  the 
scheduler.  We  assume  however  that  any  code  that  is  executed 
between  the  data-structure  operations  terminates.  If  a  data 
structure  is  wait-free  then  it  is  also  lock-free  [4].  Similarly, 
lock-freedom  implies  obstruction-freedom  [5].  Wait-free  data 
stmctures  are  desirable  because  they  guarantee  the  absence  of 
live-locks  and  starvation.  However,  wait-free  data  structures 
are  often  complex  and  inefficient.  Lock-free  data  structures, 
on  the  other  hand,  often  perform  more  efficiently  in  practice. 
They  also  ensure  the  absence  of  live-locks  but  allow  starvation. 
Since  starvation  is  an  unlikely  event  in  many  cases,  lock-free 
data  structures  are  predominant  in  practice  and  we  focus  on 
them  in  this  paper.  However,  our  techniques  apply  in  principle 
also  to  wait-free  data  structures  (see  §IX). 

Treiber’s  Stack:  As  a  concrete  example  we  consider 
Treiber’s  non-blocking  stack  [6],  a  classic  lock-free  data 
structure.  The  shared  data  structure  is  a  pointer  S'  to  a  linked 
list  and  the  operations  are  push  and  pop  as  given  in  Figure  1 . 

The  operation  push(v)  creates  a  pointer  x  to  a  new  list  node 
containing  the  data  v.  Then  it  stores  the  current  stack  pointer 
S  in  a  local  variable  t  and  sets  the  next  pointer  of  the  new 
node  X  to  t.  Finally  it  attempts  an  atomic  compare  and  swap 
operation  CAS(&S,t,tT)  to  swing  S  to  point  to  the  new  node  x. 
If  the  stack  pointer  S  still  contains  t  then  S  is  updated  and 
CAS  returns  true.  In  this  case,  the  do-while  loop  terminates 
and  the  operation  is  complete.  If  however,  the  stack  pointer 
S  has  been  updated  by  another  thread  so  that  it  no  longer 
contains  t  then  CAS  returns  false  and  leaves  S  unchanged. 
The  do-while  loop  then  does  another  iteration,  updating  the 
new  list  node  to  a  new  value  of  S.  The  operation  pop  works 
similarly  to  push(v).  If  the  stack  is  empty  (t==NULL)  then 
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struct  Node  { 
value_t  data; 
Node  *next; 

}; 


Node  *S; 

void  init { ) 

{S  =  NULL; } 


void  push(value_t  v)  { 
Node  *t,  *x; 

X  =  new  Node ( ) ; 
x->data  =  v; 
do  {  t  =  S ; 

x->next  =  t; 

}  while ( !CAS {&S, t,x) ) ; 

} 


value_t  popO  { 

Node  *t,  *x; 
do  {  t  =  S  ; 
if  (t  ==  NULL) 

{ return  EMPTY ; } 

X  =  t->next; 

}  while { !CAS (&S, t,x) )  ; 
return  t->data; 

} 


Fig.  1.  An  implementation  of  Treiber’s  lock-free  stack  as  given  by  Gotsman  et  al.  [13]. 


I  :=  -1;  //initialization 

pingO  =  if  I  ==  TID  then  {  while  (true)  do  {  }  } 
else  {  I  :=  TID  } 

Fig.  2.  A  shared  data  structure  that  shows  a  limitation  of  the  method  of 
proving  lock-freedom  that  has  been  introduced  by  Gotsman  et  al.  [13].  For 
every  n,  the  parallel  execution  of  n  ping  operations  terminates.  However,  the 
data  structure  is  not  lock-free.  (It  is  based  on  an  idea  from  James  Aspnes.) 

pop  returns  EMPTY.  Otherwise  it  repeatedly  tries  to  update 
the  stack  pointer  with  the  successor  of  the  top  node  using  a 
do- while  loop  guarded  by  a  CAS. 

Treiber’s  stack  is  lock-free  but  not  wait-free.  If  other  threads 
execute  infinitely  many  operations  they  could  prevent  the  oper¬ 
ation  of  a  single  thread  from  finishing.  The  starvation  of  one 
thread  is  nevertheless  only  possible  if  infinitely  many  operations 
from  other  threads  succeed  by  performing  a  successful  CAS. 
The  use  of  do-while  loops  that  are  guarded  by  CAS  operations 
is  characteristic  for  lock-free  data  structures. 

Lock-Freedom  and  Termination:  Before  we  verify 
Treiber’s  stack,  we  consider  lock-freedom  in  general.  Following 
an  approach  proposed  by  Gotsman  et  al.  [13],  we  reduce  the 
problem  of  proving  lock-freedom  to  proving  termination  of  a 
certain  class  of  programs.  Let  D  be  any  shared-memory  data 
stmcture  with  k  operations  tti,  . . . ,  TTfc.  It  has  been  argued  [13] 
that  D  is  lock-free  if  and  only  if  the  following  program  termi¬ 
nates  for  every  n  €  N  and  every  op^, . . . ,  op„  €  {tti,  . . . 

Dn  =  ||j=i  „  opj .  However,  this  reduction  does  not  apply 

to  all  shared-memory  data  stmctures.  Many  non-blocking  data 
stmctures  have  operations  that  can  distinguish  different  callers, 
for  instance  by  accessing  their  thread  ID.  A  simple  example 
is  described  in  Figure  2.  The  shared  data  structure  consists  of 
an  integer  I  and  a  single  operation  ping.  If  ping  is  executed 
twice  by  the  same  thread  without  interference  from  another 
thread  then  the  second  execution  of  ping  will  not  terminate. 
Otherwise,  each  call  of  ping  immediately  returns.  As  a  result, 
the  program  nP^^S  terminates  for  every  n  but  the  data 

structure  is  not  lock-free. 

We  are  also  aware  of  a  similar  example  that  uses  the  total 
number  of  threads  in  the  system  instead  of  thread  IDs.  It  is 
in  general  very  common  to  use  these  system  properties  in 
non-blocking  data  structures.  Three  of  the  five  examples  in 
our  paper  use  thread  IDs  (the  hazard  pointer  stack,  the  hazard 
pointer  queue,  and  the  elimination-backoff  stack). 

Consequently,  we  have  to  prove  a  stronger  termination 
property  to  prove  that  a  data  structure  is  lock-free.  Instead 
of  assuming  that  each  client  only  executes  one  operation,  we 
assume  that  each  client  can  execute  finitely  many  operations. 


To  this  end,  we  define  a  set  of  programs  5"  that  sequentially 
execute  n  operations. 

5”  =  {opp, . . . ;  op„  I  Vz  :  op,  £  {tti,  . . . ,  TTfc}} 

Let  S  =  Un6N‘^"-  define  fhe  sef  of  programs 

that  execute  m  programs  in  S  in  parallel. 

iP™  =  {  II  s,  |Vz:s,  €5} 

Finally,  we  set  V  =  UmeN^™-  proving  lock-freedom,  we 
rely  on  the  following  theorem.  By  allowing  a  fixed  but  arbitrary 
number  of  operations  per  thread  we  avoid  the  limitations  of 
the  previous  approach. 

Theorem  1.  The  data  structure  D  with  operations  tti,  . . . ,  TTfc 
is  lock-free  if  and  only  if  every  program  P  G  V  terminates. 

Proof.  Assume  first  that  D  is  lock-free.  Let  P  G  V.  We  prove 
that  P  terminates  by  induction  on  the  number  of  incomplete 
operations  in  P,  that  is,  operations  that  have  not  yet  been 
started  or  operations  that  have  been  started  but  have  not  yet 
completed.  If  no  operation  is  incomplete  then  P  immediately 
terminates.  If  n  operations  are  incomplete  then  the  scheduler 
has  already  or  will  start  an  operation  by  executing  one  of  the 
threads.  By  the  definition  of  lock-freedom,  some  operation  will 
complete  independently  of  the  choices  of  the  scheduler.  So 
after  a  finite  number  of  steps,  we  reach  a  point  in  which  only 
n—1  incomplete  operations  are  left.  The  termination  argument 
follows  by  induction. 

To  prove  the  other  direction,  assume  now  that  every  program 
P  G  V  terminates.  Furthermore,  assume  for  the  sake  of 
contradiction  that  D  is  not  lock-free.  Then  there  exists 
some  concurrent  program  P^o  that  only  executes  operations 
op  G  {tti,  . . . ,  TTfc}  and  an  execution  trace  T  of  P^o  in  which 
some  operations  have  started  but  no  operation  ever  completes. 
It  follows  that  Poo  diverges  and  T  is  therefore  infinite.  Let  n 
be  the  number  of  threads  in  P^o  and  let  Si  be  the  sequential 
program  that  consists  of  all  operations  that  have  been  started 
by  thread  z  in  the  execution  trace  T  in  their  temporal  order. 
Then  program  ^  scheduled  to  produce 

the  infinite  execution  trace  T.  This  contradicts  the  assumption 
that  every  program  in  V  terminates.  □ 

III.  Quantitative  Reasoning  to  Prove 
Lock-Freedom 

A  key  insight  of  our  work  is  that  for  many  lock-free  data 
structures,  it  is  possible  to  give  an  upper  bound  on  the  total 
number  of  loop  iterations  in  the  programs  in  V  (§11). 

To  see  why,  note  that  most  non-blocking  operations  are 
based  on  the  same  optimistic  approach  to  concurrency.  They 
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repeatedly  try  to  access  or  modify  a  shared-memory  data 
structure  until  they  can  complete  their  operation  without 
interference  by  another  thread.  However,  lock-freedom  ensures 
that  such  interference  is  only  possible  if  another  operation 
successfully  makes  progress: 

In  an  operation  of  a  lock-free  data  structure,  the  failure  of 
a  thread  to  make  progress  is  always  caused  by  successful 
progress  in  an  operation  executed  by  another  thread. 


This  property  is  the  basis  of  a  novel  reasoning  technique  that 
we  call  a  quantitative  compensation  scheme.  It  ensures  that  a 
thread  is  compensated  for  loop  iterations  that  are  caused  by 
progress — often  the  successful  completion  of  an  operation — in 
another  thread.  In  return,  when  a  thread  makes  progress  (e.g., 
completes  an  operation),  it  compensates  the  other  threads.  In 
this  way,  every  thread  is  able  to  “pay”  for  its  loop  iterations 
without  being  aware  of  the  other  threads  or  the  scheduler. 

Consider  for  example  Treiber’s  stack  and  a  program  P„  in 
which  every  thread  only  executes  one  operation,  that  is,  P„  = 
lli=i  n^i  Si  €  {push,  pop}.  An  execution  of  P„  never 
performs  more  than  loop  iterations.  Using  a  compensation 
scheme,  this  bound  can  be  verified  in  a  local  and  modular 
way.  Assume  that  each  of  the  threads  has  a  number  of  tokens 
at  its  disposal  and  that  each  loop  iteration  in  the  program 
costs  one  token.  After  paying  for  the  loop  iteration,  the  token 
disappears  from  the  system.  Because  it  is  not  possible  to  create 
or  duplicate  tokens — tokens  are  an  affine  resource — the  number 
of  tokens  that  are  initially  present  in  the  system  is  an  upper 
bound  on  the  total  number  of  loop  iterations  executed. 

Unfortunately,  the  maximum  number  of  loop  iterations 
performed  by  a  thread  depends  on  the  choices  of  the  scheduler 
as  well  as  the  number  of  operations  that  are  performed  by 
the  other  threads.  To  still  make  possible  local  and  modular 
reasoning,  we  define  a  compensation  scheme  that  enables  the 
threads  to  exchange  tokens.  Since  each  loop  iteration  in  is 
guarded  by  a  CAS  operation  this  compensation  scheme  can 
be  conveniently  integrated  into  the  specification  of  CAS.  To 
this  end,  we  require  that  (logically)  n—1  tokens  have  to  be 
available  to  execute  a  CAS. 


(a)  If  the  CAS  is  successful  then  it  returns  true  and 
(logically)  0  tokens.  Thus,  the  executing  thread  loses 
n—1  tokens. 

(b)  If  the  CAS  is  unsuccessful  then  it  returns  false  and 
(logically)  n  tokens.  Thus,  the  executing  thread  gains  a 
token  that  it  can  use  to  pay  for  its  next  loop  iteration. 

The  idea  behind  this  compensation  scheme  is  that  every  thread 
needs  n  tokens  to  perform  a  data  structure  operation.  One 
token  is  used  to  pay  for  the  first  loop  iteration  and  n—1  tokens 
are  available  during  the  loop  as  the  loop  invariant.  If  the  CAS 
operation  of  a  thread  A  is  successful  (case  (a))  then  this  can 
cause  at  most  n—1  CAS  operations  in  the  other  threads  to  fail. 
These  n—1  failed  CAS  operations  need  to  return  one  token 
more  than  they  had  prior  to  their  execution  (case  (b)).  On  the 
other  hand,  the  successful  thread  A  does  not  need  its  tokens 
anymore  since  it  will  exit  the  do-while  loop.  Therefore  the  n—1 
tokens  belonging  to  A  are  passed  to  the  other  n—1  threads  to 


pay  for  the  worst-case  scenario  in  which  this  update  causes 
n—1  more  loop  iterations. 

If  the  CAS  operation  of  a  thread  A  fails  (case  (b)),  then  some 
other  thread  successfully  updated  the  stack  (case  (a))  and  thus 
provided  a  token  for  thread  A.  Since  A  had  n—1  tokens  before 
the  execution  of  the  CAS,  it  has  n  tokens  after  the  execution. 
So  thread  A  can  pay  a  token  for  the  next  loop  iteration  and 
maintain  its  loop  invariant  of  n—1  available  tokens. 

In  our  example  program  P„,  there  are  n?  many  tokens  in  the 
system  at  the  beginning  of  the  execution.  So  the  number  of  loop 
iterations  is  bounded  by  n?  and  the  program  terminates.^  More 
generally,  we  can  use  the  same  local  and  modular  reasoning 
to  prove  that  every  program  with  n  threads  such  that  thread  i 
executes  rrii  operations  performs  at  most  1°°? 

iterations.  Thread  i  then  starts  with  rrii  ■  n  tokens. 

We  will  show  in  the  following  that  this  quantitative  reasoning 
can  be  directly  incorporated  in  total  correctness  proofs  for  these 
programs.  We  use  the  exact  same  techniques  (for  proving  safety 
properties  [18])  to  prove  liveness  properties;  namely  concurrent 
separation  logic,  auxiliary  variables,  and  read  permissions.  The 
only  thing  we  add  to  separation  logic  is  the  notion  of  a  token 
or  a  resource  following  Atkey  [19]. 


IV.  Preliminary  Explanations 


Before  we  formalize  the  proof  outlined  in  §III,  we  give  a 
short  introduction  to  separation  logic,  quantitative  reasoning, 
and  concurrent  separation  logic.  For  the  reader  unfamiliar  with 
the  separation  logic  extensions  of  permissions  and  auxiliary 
variables,  see  Appendix  I  and  the  relevant  literature  [20],  [21]. 
Our  full  logic  is  defined  in  Appendix  II. 

Separation  Logic:  Separation  logic  [22],  [23]  is  an 
extension  of  Hoare  logic  [24]  that  simplifies  reasoning  about 
shared  mutable  data  stmctures  and  pointers.  As  in  Hoare  logic, 
programs  are  annotated  with  Hoare  triples  using  predicates 
P,Q, . . .  over  program  states  (heap  and  stack).  A  Hoare  triple 
[H]  C  [Q]  for  a  program  C  is  a  total-correctness  specification  of 
C  that  expresses  the  following.  If  C  is  executed  in  a  program 
state  that  satisfies  P  then  C  safely  terminates  and  the  execution 
results  in  a  state  that  satisfies  Q. 

In  addition  to  the  traditional  logical  connectives,  predicates  of 
separation  logic  are  formed  by  logical  connectives  that  enable 
local  and  modular  reasoning  about  the  heap.  The  separating 
conjunction  P  *  Q  is  satisfied  by  a  program  state  if  the  heap 
of  that  state  can  be  split  in  two  disjoint  parts  such  that  one 
sub-heap  satisfies  P  and  one  sub-heap  satisfies  Q.  It  enables 
the  safe  use  of  the  frame  mle 


[P\C[Q] 
[P*R]C[Q*  R] 


(Frame) 


With  the  frame  rule  it  is  possible  to  specify  only  the  part  of 
the  heap  that  is  modified  by  the  program  C  (using  predicates 
P  and  Q).  This  specification  can  then  be  embedded  in  a  larger 
proof  to  state  that  other  parts  of  the  heap  are  not  changed 
(predicate  R). 


*In  fact  there  are  at  most  (j)  loop  iterations  in  the  worst  case.  However, 
the  bound  is  sufficient  to  prove  termination. 
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Quantitative  Reasoning:  Being  based  on  the  logic  of 
bunched  implications  [25],  separation  logic  treats  heap  cells  as 
linear  resources  in  the  sense  of  linear  logic.  It  is  technically 
nnproblematic  to  extend  separation  logic  to  reason  about 
affine  consumable  resources  too  [19].  To  this  end,  the  logic  is 
equipped  with  a  special  predicate  0,  which  states  the  availability 
of  one  consumable  resource,  or  token.  The  predicate  is  affine 
becanse  it  is  satisfied  by  every  state  in  which  one  or  more 
tokens  are  available.  This  in  contrast  to  a  linear  predicate  like 
E  ^  F  that  is  only  satished  by  heaps  H  with  |dom(iJ)|  =  1. 

Using  the  separating  conjunction  0  *  P,  it  is  straightforward 
to  state  that  two  or  more  tokens  are  available.  We  dehne  0"  to 
be  an  abbreviation  for  n  tokens  <>*...*<}  that  are  connected 
by  the  separating  conjunction  *. 

Since  we  use  consumable  resources  to  model  the  terminating 
behavior  of  programs,  the  semantics  of  the  while  command  are 
extended  such  that  a  single  token  is  consumed,  if  available,  at 
the  beginning  of  each  iteration.  Correspondingly,  the  derivation 
rule  for  while  commands  ensures  that  a  single  token  is  available 
for  consumption  on  each  loop  iteration  and  thus  that  the  loop 
will  execute  safely: 

P/^B  =>  P'  *0  IV-  [P']  C  [P] 

- — - ^ ,  (While) 

I  h  [P]  while  B  do  C[P  A  ^B] 

The  loop  body  C  must  preserve  the  loop  invariant  P  under  the 
weakened  precondition  P' .  C  is  then  able  to  execute  under  the 
assumption  that  one  token  has  been  consumed  and  still  restore 
the  invariant  P,  thns  making  a  token  available  for  possible 
future  loop  iterations. 

The  tokens  <}  can  be  freely  mixed  with  other  predicates 
using  the  usual  connectives  of  separation  logic.  For  instance, 
the  formula  x  i—t  lOV  (a;  i— )■  _*0)  expresses  that  the  heap-cell 
referred  to  by  the  variable  x  points  to  10,  or  the  heap-cell 
points  to  an  arbitrary  value  and  a  token  is  available.  Together 
with  the  frame  rule,  the  tokens  enable  modular  reasoning  about 
quantitative  resource  usage. 

Concurrent  Separation  Logic:  Concurrent  separation 
logic  (CSL)  is  an  extension  of  separation  logic  that  is  used  to 
reason  about  concurrent  programs  [14].  The  idea  is  that  shared 
memory  regions  are  associated  with  a  resource  invariant.  Each 
atomic  block  that  modihes  the  shared  region  can  assume  that 
the  invariant  holds  at  the  beginning  of  its  execution  and  must 
ensure  that  the  invariant  holds  at  the  end  of  the  atomic  block. 

The  original  presentation  of  CSL  [14]  uses  conditional 
critical  regions  (CCRs)  for  shared  variables.  In  this  article, 
we  follow  Parkinson  et  al.  [18]  and  assume  a  global  shared 
region  with  one  invariant  so  as  to  simplify  the  syntax  and  the 
logic.  An  extension  to  CCRs  is  possible.  For  predicates  I,P, 
and  Q,  the  judgment  I  \-  [P]C  [Q]  states  that  under  the  global 
resource  invariant  /,  in  a  state  where  P  holds,  the  execution 
of  the  concurrent  program  C  is  safe  and  terminates  in  a  state 
that  satishes  Q. 

Concurrent  execution  of  programs  Ci  and  C2  is  written  as 
C'l  II  C2-  We  assume  that  shared  variables  are  only  accessed 
within  atomic  regions  using  the  command  atomic(C)  and  that 
atomic  blocks  are  not  nested.  An  interpretation  of  the  resource 


invariant  I  is  that  it  specihes  a  part  of  the  heap  owned  by  the 
shared  region.  The  logical  mle  Atom  for  the  command  atomic 
transfers  the  ownership  of  this  heap  region  to  the  executing 
thread. 


emp  h  [P  *  I]C  [Q  *  I] 
I  h  [P]  atomic{C}  [Q] 


(Atom) 


Because  the  atomic  construct  ensures  mutual  exclusion,  it  is 
safe  to  share  /  between  two  programs  that  run  in  parallel. 
Pre-  and  post-conditions  of  concurrent  programs  are  however 
combined  by  use  of  the  separating  conjunction^: 
JK[Pi]Ci[Qi]  Jh[P2]C2[g2] 

/  h  [Pi  *  P2]  Cl  II  C2  [Qi  *  Q2] 


Most  of  the  other  rules  of  sequential  separation  logic  can  be 
used  in  CSL  by  just  adding  the  (unmodihed)  resource  invariant 
I  to  the  rules.  The  invariant  is  only  used  in  the  rule  Atom. 

A  technical  detail  that  is  crucial  for  the  soundness  of  the 
classic  rule  for  conjunction  [24]  is  that  we  require  the  resource 
invariant  I  to  be  precise  [14]  with  respect  to  the  heap  [26]. 
This  means  that,  for  a  given  heap  H  and  stack  V,  there  is  at 
most  one  sub-heap  H'  C  H  such  that  the  sate  (H' ,  V)  satishes 
I.  All  invariants  we  use  in  this  article  are  precise.  Note  that 
precision  with  respect  to  the  resource  tokens  0  is  not  required 
since  they  are  affine  and  not  linear  entities. 


V.  Formalized  Quantitative  Reasoning 

In  the  following,  we  show  how  quantitative  concurrent 
separation  logic  can  be  used  to  formalize  the  quantitative 
compensation  scheme  that  we  exemplihed  with  Treiber’s  non- 
blocking  stack  in  §III.  The  most  important  rules  of  this  logic 
are  described  in  §IV.  The  logic  is  formally  dehned  and  proved 
sound  in  Appendix  II. 

Before  we  verify  realistic  non-blocking  data  structures,  we 
describe  the  formalized  quantitative  reasoning  for  a  simple 
producer  and  consumer  example. 

Producer  and  Consumer  Example:  In  the  example  in 
Figure  3,  we  have  a  heap  location  B  that  is  shared  between  a 
number  of  producer  and  consumer  threads.  A  producer  checks 
whether  B  contains  the  integer  0  (i.e.,  B  is  empty).  If  so 
then  it  updates  B  with  a  newly  produced  value  and  terminates. 
Otherwise,  it  leaves  B  unchanged  and  terminates.  A  consumer 
checks  whether  B  contains  a  non-zero  integer  (i.e,  B  is  non¬ 
empty).  If  so  then  it  consumes  the  integer,  sets  the  contents  of 
B  to  zero,  and  loops  to  check  if  B  contains  a  new  value  to 
consume.  If  B  contains  0  then  the  consumer  terminates. 

If  we  verify  this  program  using  our  quantitative  separation 
logic  then  we  prove  that  the  number  of  tokens  specified  by 
the  precondition  is  an  upper  bound  on  the  number  of  loop 
iterations  of  the  program.  Since  the  number  of  specihed  tokens 
is  always  hnite,  we  have  thus  proved  termination. 

The  challenge  in  the  proof  is  that  the  loop  iterations  of 
the  operation  Consumer  depend  on  the  scheduler  and  on  the 
number  of  Producer  operations  that  are  executed  in  parallel. 
However,  it  is  the  case  that  a  program  that  uses  n  Consumer 


^We  omit  the  variable  side-conditions  here  for  clarity.  They  are  included  in 
the  full  set  of  derivation  rules  in  Appendix  ll. 
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Consumer  0  = 

![0]| 

X  :=  1 

[OVm  — 0]j  //  loop  Inv. 
while  X  ! =  0  do  { 

//While  rule  antecedent : 

//{<)  V  a:  —  0)  A  -'{x  —  0)  ^  emp  *  0 
[  emp  ] 
atomic  { 

"[Bi— >-ii*(u  =  0V0)]  //atomic 
X  :=  [B] 

if  X  !=  0  then  { 

[B  i-A  a:  *  (a:  —  0  V  0)  A  -i(m  —  0)] 

[B  I— m  *  0] 

[B]  :=  0 
[B  i-A  0] 

[/  *  0] 

[I  *  {Ow  X  =  0)] 

}  else  { 
skip 

—  OVO)] 

[I  *  {OV  X  ^  0)] 

}  }  |[(0Va:  — 0)]  //  end  atom,  block 

}!  [ (0  V  m  —  0)  A  (a:  —  0) ]  //end  while 
[  emp  ] 

Fig.  3.  A  lock-free  data  structure  B  with  the  operations  Consumer  and 
Producer.  The  operation  Consumer  terminates  if  finitely  many  Producer 
operations  are  executed  in  parallel.  The  verification  of  lock-freedom  and 
memory  safety  uses  a  compensation  scheme  and  quantitative  concurrent 
separation  logic. 


Producer (y)  = 

LI?]] 

atomic  { 

rrO  *  -fTl  //  atom,  block 
if  [B]  =  0  then  { 

—  oVO)] 

Tb  ]'  :=  y 
[0  *  -B  i-A  y] 

[(0  V  y  ^  0)  *  B  i-A  y] 

[/] 

}  else  { 
skip 

} 

)V[emp]  //  end  atom,  block 


operations  and  m  Producer  operations  performs  at  most  n  +  m 
loop  iterations.  We  can  prove  this  claim  using  our  quantitative 
separation  logic  by  deriving  the  following  specifications. 

I  h  [0]  ConsumerQ  [emp]  and  /  h  [{>]  Producer(y)  [emp] 

However,  the  modular  and  local  specifications  of  these  op¬ 
erations  only  hold  in  an  environment  in  which  all  programs 
adhere  to  a  certain  policy.  This  policy  can  be  expressed  as 
a  resource  invariant  /  in  the  sense  of  concurrent  separation 
logic.  Intuitively,  /  states  that  the  shared  memory  location  B  is 
read-writable,  and  either  is  empty  (B  =  0)  or  there  is  a  token 
<>  available.  We  define 

/  =  3u.  Bi— )'t6*(u  =  0V0)  . 

Now  we  can  read  the  specifications  of  Consumer  and  Producer 
as  follows.  The  token  0  in  the  precondition  of  Consumer  is 
used  to  pay  for  the  first  loop  iteration.  More  loop  iterations 
are  only  possible  if  some  producer  updated  the  contents  of 
heap  location  B  to  a  non-zero  integer  v  before  the  execution 
of  the  atomic  block  of  Consumer.  We  then  rely  on  the  fact 
that  the  producer  respected  the  resource  invariant  I.  If  B  >—)■  u 
and  u  /  0  then  the  only  possibility  of  maintaining  /  is  by 
providing  a  token  <>.  The  operation  Consumer  then  updates  B 
to  zero  and  can  thus  establish  the  invariant  I  without  using  a 
token.  So  the  token  in  the  invariant  becomes  available  to  pay 
for  the  next  loop  iteration.  Figure  3  contains  an  outline  of  the 
proof  for  Producer  and  Consumer.  Note  that  our  proof  also 
verifies  memory  safety. 

From  Local  Proofs  to  Lock-Freedom:  Using  the  derived 
specifications  of  the  operations  and  the  frame  rule,  we  induc¬ 
tively  prove  I  h  [0^]  opi  \ . . . ;  op/.  [emp]  where  each  opj  is 
a  Consumer  or  Producer  operation.  In  other  words,  we  have 
then  proved  [0^]  s  [emp]  for  all  s  G  (recall  the  definition 
from  §11).  Let  now  Sj  €  5™’  for  1  <  i  <  n.  Using  the  rule 


Par,  we  can  then  prove  for  m  =  n 

/  F  [0™]  1 1  s,  [emp]  . 

This  shows  that  the  program  ^  Si  performs  at  most 

m-fl  loop  iterations  (one  token  can  be  present  in  the  resource 
invariant  I)  when  it  is  executed.  Following  the  discussion  in 
§11,  this  proves  that  every  program  p  G  V  terminates  and  that 
{B;  Producer,  Consumer)  is  a  lock-free  data  structure. 

Similarly,  we  can  in  general  derive  a  termination  proof  for 
every  program  in  V  from  such  specifications  of  the  operations 
of  a  data  structure.  Assume  that  a  shared-memory  data  structure 
(S';  TTi, . . . ,  TTfc)  is  given.  Assume  furthermore  that  we  have 
verified  for  all  1  <  i  <  A:  the  specification  I{n)  h  * 

P]  TTi  [P]  .  The  notations  /(n)  and  /(n)  indicate  that  the  proof 
can  use  a  meta-variable  n  which  ranges  over  N.  However,  the 
proof  is  uniform  for  all  n.  Additionally,  P  might  contain  a 
variable  tid  for  the  thread  ID.  From  this  specification  follows 
already  the  lock-freedom  of  S.  To  see  why,  we  can  argue 
as  in  the  producer-consumer  example.  First,  it  follows  for 
every  n  and  s  G  S™  that  I(n)  h  *  p]  g  [p]  .  Second, 

a  loop  bound  for  p  =  ^  with  Si  G  S'"’  is 

derived  as  follows.  We  use  the  rule  Par  to  prove  for  m  = 
•  f(n)  that 

/(n)  h  [0™  *  ®  P{tid)]p[  @  P{tid)]  . 

Thus  every  p  G  V  terminates  and  according  to  the  proof  in  §11, 
the  data  structure  (S;  tti,  . . . ,  tt^)  is  lock-free. 

VI.  Lock-Freedom  oe  Treiber’s  Stack 
We  now  formalize  the  informal  proof  of  the  lock-freedom 
of  Treiber’s  stack  that  we  described  in  §111.  In  Appendix  III, 
we  outline  how  the  proof  can  be  easily  extended  to  also  verify 
memory  safety.  Figure  4  shows  the  implementation  of  Treiber’s 
stack  in  the  while  language  we  use  in  this  article. 

Each  thread  that  executes  push  or  pop  operations  can  be 
in  one  of  two  states.  It  either  has  some  expectation  on  the 
contents  of  the  shared  data  structure  S  (critical  state)  or  it  does 
not  have  any  expectation  (non-critical  state).  More  concretely, 
a  thread  is  in  a  critical  state  if  and  only  if  it  is  executing  a 
push  or  pop  operation  and  is  in  between  the  two  atomic  blocks 
in  the  while  loop.  The  thread  then  expects  that  t  =  [5'].  The 
resource  invariant  that  we  will  formalize  in  quantitative  CSL 
can  be  described  as  follows. 

For  each  thread  T  in  the  system  one  of  the  following  holds. 

(1)  The  thread  T  is  in  a  critical  state  and  its  expectation 
on  the  shared  data  structure  is  true.  (2)  The  thread  T  is 
in  a  critical  state  and  some  other  thread  provided  T  with 
a  token.  (3)  The  thread  T  is  in  a  non-critical  state. 

To  formalize  this  invariant,  we  have  to  expose  the  local 
assumption  of  the  threads  (t  =  [S'])  to  the  global  state.  This  is 
why  we  use  auxiliary  array  A.  If  the  thread  with  the  thread 
ID  tid  is  in  a  critical  state  then  A[tid]  contains  the  value  of 
its  local  variable  t.  Otherwise  A[tid]  contains  0.  Similarly,  we 
have  a  second  auxiliary  array  C  such  that  C[tid]  contains  a 
non-zero  integer  if  and  only  if  the  thread  with  ID  tid  is  in  a 
critical  state.  As  shown  in  Figure  4,  the  arrays  A  and  C  are 


216 


S  :=  alloc (1) ;  [S]  :=  0; 

A  :=  alloc  (max_tid)  ;  C  :=  alloc  (inax_tid)  ; 

push{v)  ^ 
pushed  :=  false; 

X  : =  alloc (2 ) ; 

[x]  :=  v; 

[{pushed  W  ()^)  * //  loop  invariant 
while  {  ! pushed  )  do  { 

//While  rule  antecedent: 

{{pushed  V  0^)  ^  ^r{t'id,  A  \ pushed  ^  *  'yr{tid,  *  0 

atomic  { 


[O’ 

^  ^  *  S'  1— li  *  OL{tid.,  If)  *  1' {tid ^  if) 

]  //  atom  block 

[O’ 

_,  _)  *  S  1— If  * /' (tid,  If)  ]  //  impl . 

&  read  perm. 

t  : 

=_TS]  ;  _ 

[O’ 

*  7(ti(i,_,  _)  *{Si—^uAt  —  u)*  I' {tid,  if)] 

//  read  &  frame 

C[tid]  :=  1 

[O’ 

^  ^  *  'y{tid,  1)  *■  {S  u  A  t  —  u)  *■  l'  {tid,  u)  ] 

//  assignment 

A[tid]  :=  t 

[O’ 

*  {A[tid]  i—^tAt  —  u)*  C[tid\  i-a  1  *  S  i-a  if  > 

K  I' {tid.  If)  ] 

[O’ 

*  'yr{tid,  1)  *  S  i-A  If  *  ot{tid,  u)  *  I' {tid,  it)] 

//  perm. 

[O’ 

*  7^ (tid,  t,  1)  *  I]  //  exist .  intro  &  (3) 

); 

[0" 

1)]  //  atomic  block  &  frame 

//  [x+1]  :=  t;  this  is  not  essential  for  lock- freedom 
atomic  { 

[0^^  — 1  ^  'yj.{tid,  1)  *  /]  I  //  atomic  block 

1)  * 'S' I— q:(a  li)  ]  //  exist,  elim. 

s  :=  [S];  if  s  ==  t  then  { 

[0"“^  ♦  *S  ^  t*  ®{i.,,,.„}\{tid}(7(i, _,_))] 

[S]  :=  x; 

['y{tid,  *  S  ^  X  *  I' {tid,  x)]  //  permissions  &  (4) 

pushed  :=  true 


[  {pushed  V  0 

”)  *  '/{tid,  _,  _) 

1  *  3if.  S  i-A  If  *  I' {tid,  If)  ] 

}  else  { 

[O"”  ^  ^  t  ^  u  A  'yr{tid,  t,  1) 

*  ct{tid.  If)  *  S  i-A  If  *  I' {tid, 

«)] 

[()^  *■  ^{tid,  t,  1)  *■  S  u  *  I' {tid,  u)]  //  impl.  using 

(5) 

skip 

[  {pushed  V  0 

”)  *  '/{tid,  _,  _) 

1  *  3if.  S  i-A  If  *  I' {tid,  If)  ] 

}; 

C[tid]  :=  0 

[  {pushed  V  0” 

')  *  ^{tid,  0) 

*  S  i-A  If  *  I' {tid.  If)  ] 

//  write  &  exist,  elim  (above)  and  permissions  & 

impl 

[  {pushed  V  0” 

')  *  ’yj.{tid,  _] 

1  *  Oi{tid,  If)  *  S  i-A  If  *  I' {tid. 

«)] 

[  {pushed  V  0” 

)  *  '/r{i‘^d,  _,  _] 

)*I]  //  exist,  intro 

}; 

[ {pushed  V  0^' 

)  *  /r{tid,  _)  i 

]l  //  atomic  block  end 

} 

Fig.  4.  An  implementation  of  the  push  operation  of  Treiber’s  lock-free  stack 
in  our  language  and  the  verification  of  the  while  loop.  The  CAS  operation 
is  implemented  using  an  atomic  block  that  updates  the  local  variable  pushed. 
The  auxiliary  array  A  contains  in  Aftid]  the  value  of  the  local  variable  t  of 
the  thread  with  ID  tid  or  zero  if  the  thread  does  not  assume  t  =  [S].  The  loop 
invariant  pushed  V  ()^  states  that  either  the  new  element  x  has  been  pushed 
to  the  stack  S  or  there  are  n  tokens  available.  The  predicates  7  and  7r  are 
defined  in  (1). 

never  used  on  the  right-hand  side  of  an  assignment  and  are 
only  updated  in  the  two  atomic  blocks  of  each  operation. 

Let  n  be  the  number  of  threads  in  the  system.  We  dehne 
/  =  3t6.  S' I— )■  u  *  ®  a{i,u) 

0<i<n 

a(i,  u)  =  3a,  c.  C[i\  Ac*  A[i]  A  a*(c  =  0Va  =  MV0) 

The  resource  invariant  I  states  that  the  shared  region  has  a 
full  permission  for  the  heap  location  S  that  points  to  the  value 
u.  Additionally,  the  predicate  a{i,u)  states  for  each  thread  i 
that  the  shared  region  has  read  permissions  for  C[i\  and  A[i\-, 
and  that  thread  z  is  in  a  non-critical  section  (c  =  0),  that  the 
local  variable  t  contains  the  value  [S]  (a  =  u),  or  that  there  is 


a  token  0  available. 

We  use  read  permissions  since  threads  need  access  to  the 
local  predicate  A[tid]  A  f  at  some  point  to  infer  that  A[fzc?] 
contains  the  value  of  the  local  variable  t.  This  relation  of  the 
local  variable  t  with  the  array  A  is  the  only  technical  difficulty 
in  the  proof  Just  as  in  safety  proofs,  we  can  now  use  the  rules 
of  our  quantitative  concurrent  separation  logic  to  verify  the 
following  Hoare  triples. 

Ih  *  O”]  push(v)  [yritid,_,_)] 

I  h  /r{tid,  _,  _)  *  0”]  popO  britid,  _)] 

Where  7  and  7^  are  dehned  as: 

7(f,  a,  c)  =  A[t]  !->■  a  *  C[t]  ^  c  (1) 

7r(f,  a,  c)  =  A[t]  A  a  *  C[t\  A  c  (2) 


Thus,  the  execution  of  any  operation  requires  n  tokens  and 
read  permission  to  the  heap  locations  A[tid]  and  C[tid].  After 
execution,  the  tokens  are  consumed  and  we  are  left  with  the 
read  permissions.  Figure  4  contains  a  proof  outline  for  the 
while  loop  of  push.  We  use  the  following  abbreviation  for  parts 
of  the  invariant  I  that  are  not  needed  in  the  local  proof. 

®  a{i,u) 


We  have  for  all  values  u  and  j  G  {0, . . .  ,n— 1}  that 
I  =  3u.  S  i-G  u*  a{j,  u)  *  I'{j,  u) 

0""^  @  (7r(L_,_))  ^  I'U,u) 

ie{o,...,n-r}\{j} 

t  ^  uAjrUiA)  *  a{j,u)  =>  0*7(j,f,  1) 


(3) 

(4) 

(5) 


Using  these  assertions,  the  verification  of  push  and  pop  is  a 
straightforward  application  of  the  rules  of  our  logic.  Figure  4 
describes  the  main  part — the  while  loop — of  the  proof  of  push. 
The  loop  invariant  pushed  V  0”  states  that  either  the  new 
element  x  has  been  pushed  onto  the  stack  S  or  there  are 
n  tokens  available.  In  the  first  atomic  block  we  leave  the 
assumptions  I'{tid,u)  of  the  other  thread  untouched  and  just 
establish  A[tid]  At*  C[tid]  A  1. 

The  key  aspect  of  the  proof  is  the  second  atomic  block  which 
corresponds  to  the  CAS  operation  in  the  original  code.  In  the 
if  case,  we  possibly  break  the  assumptions  of  the  other  threads 
([S]  :=  x).  Then  we  have  to  use  n  —  1  tokens  and  implication 
(4)  to  re-establish  r{tid,u).  Since  the  variable  pushed  is  set 
to  true,  we  can  maintain  the  loop  invariant  without  using 
another  token.  In  the  else  case  we  use  the  inequality  t  u 
and  implication  (5)  to  derive  the  loop  invariant.  Finally,  we 
re-establish  a{tid,u)  using  C[tid]  i-)-  0. 

The  verification  of  the  while  loop  of  pop  is  similar.  By 
applying  the  proof  from  the  end  of  §V  to  the  specifications 
of  push  and  pop,  we  have  then  proved  the  lock-freedom  of 
Treiber’s  stack. 

An  interesting  aspect  of  the  proof  is  that  it  is  not  essential  for 
a  thread  to  know  the  entire  resource  invariant  I.  The  only  part 
that  is  needed  is  the  implication  S'  1— )■  _  *  0"  *  ®o<i<n  A[i\  A 
_  =>  I.  This  can  be  used  to  make  the  assumptions 

A{i)  of  the  threads  on  the  global  data  structure  abstract. 
The  implication  S  1— >  _*■(>"*  ®o<i<n  A\i\  A  _ 
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Tokens  per  Operation 


3u.  S'  i->  _  *  @o<i<n((^[*]  _  *  0)  V  A{i,  u))  holds  for  all 

predicates  A(i^u).  A  natural  candidate  for  such  an  abstraction 
is  (concurrent)  abstract  predicates  [27],  [28].  However,  such  an 
abstraction  is  not  needed  for  our  goal  of  verifying  non-blocking 
data  structures  in  this  paper. 

VII.  Advanced  Lock-Free  Data  Structures 

In  this  section  we  investigate  to  what  extent  our  quantitative 
proof  technique  can  be  used  to  prove  the  lock-freedom  of  more 
complex  shared-memory  data  structures. 

In  many  cases,  it  is  possible  to  derive  a  bound  on  the  total 
number  of  loop  iterations  like  we  do  for  Treiber’s  stack.  Table  5 
gives  an  overview  of  our  findings.  It  describes  for  several 
different  non-blocking  data  stmctures  the  number  t{n)  of  tokens 
that  are  needed  per  operation  in  a  system  with  n  threads.  The 
derived  loop  bound  on  a  system  with  n  threads  that  executes  m 
operations  is  then  t{n)*m.  In  the  hazard-pointer  data  stmctures, 
the  natural  number  f  is  a  fixed  global  paramefer  of  fhe  dafa 
structure.  The  details  are  discussed  in  the  following. 

Michael  and  Scott’s  Non-Blocking  Queue:  Michael  and 
Scott’s  non-blocking  queue  [7]  implements  a  FIFO  queue  using 
a  linked  list  with  two  pointers  to  the  head  and  the  tail  of  the 
list.  New  nodes  are  inserted  at  the  tail  and  nodes  are  removed 
from  the  head. 

To  implement  the  queue  in  a  lock-free  way,  the  insert  oper¬ 
ation  can  leave  the  data  stmcture  in  an  apparently  inconsistent 
state:  The  new  node  is  inserted  at  the  tail  using  a  CAS-guarded 
loop,  similar  to  Treiber’s  stack.  The  pointer  to  the  tail  is  then 
updated  by  a  second  CAS  operation,  allowing  other  threads  to 
access  the  data  structure  with  an  inaccurate  tail  pointer. 

To  deal  with  this  problem,  the  operations  of  the  queue 
maintain  the  invariant  that  the  tail  pointer  points  to  the  last  or 
second-to-last  node  during  the  execution  and  to  the  last  node 
after  the  execution  of  the  operation.  To  maintain  this  invariant, 
each  CAS-guarded  loop  checks  if  the  tail  pointer  points  to  a 
node  whose  next  pointer  is  Null.  In  this  case,  the  tail  pointer 
is  up  to  date  and  the  current  iteration  of  the  while  loop  can 
continue.  Otherwise,  the  tail  pointer  is  updated  to  point  to  the 
last  node  of  the  list  and  the  while  loop  is  restarted. 

To  prove  the  lock-freedom  of  Michael  and  Scott’s  queue, 
we  extend  the  invariant  I  that  we  used  in  the  verification  of 
Treiber’s  stack  with  an  additional  condition:  The  next  pointer 
of  the  node  pointed  to  by  the  tail  pointer  is  Null  or  there  is  a 
token  that  can  be  used  to  pay  for  an  additional  loop  iteration. 

3u,  t,  w.heap  i— )■  m  *  tail  i— )■  f  *  tail  -f  1  i— )■  w* 

@  I3{i,u,t)  *  {w  =  nilV  ()) 

0<i<n 

The  formulas  f3{i,u,t)  are  analogous  to  the  formulas  a{i,u) 
in  the  invariant  that  we  used  for  the  verification  of  Treiber’s 
stack.  With  this  invariant  in  a  system  with  n  threads,  we  can 
verify  the  operations  of  the  queue  using  n-\-\  tokens  in  the 
respective  preconditions. 

Hazard  Pointers:  A  limitation  of  Treiber’s  non-blocking 
stack  is  that  it  is  only  sound  in  the  presence  of  garbage 
collection.  This  is  due  to  the  ABA  problem  (see  for  instance  [8]) 
which  appears  in  many  algorithms  that  use  compare-and-swap 


Data  Structure 


Treiber’s  Stack  [6] 

Michael  and  Scott’s  Queue  [7] 
Hazard-Pointer  Stack  [8] 
Hazard-Pointer  Queue  [8] 
Elimination-Backoff  Stack  [9] 


n 

n  -f  1 
n  -I-  (£  •  n) 
(n  -I-  1)  -I-  (£  • 
n{n  1) 


Fig.  5.  Quantitative  reasoning  for  popular  non-blocking  data  structures.  The 
table  shows  the  number  t(n)  of  tokens  that  are  needed  per  operation  in  a 
system  with  n  threads.  The  derived  loop  bound  on  a  system  with  n  threads 
that  executes  m  operations  is  then  t{n)  *  m.  €  is  a  fixed  global  parameter  of 
the  data  structure. 

operations:  Assume  fhaf  a  shared  location  which  confains  A 
is  read  by  a  fhread  ti.  Then  fhread  t2  gels  activated  by  fhe 
scheduler,  modifies  fhe  shared  location  lo  B,  and  Ihen  back 
to  A.  Now  thread  ti  gets  activated  again,  falsely  assumes  that 
the  shared  data  has  not  been  changed,  and  continues  with  its 
operation.  The  result  can  be  a  corrupted  shared  data  structure, 
invalid  memory  access  or  an  incorrect  return  value. 

Michael  [8]  proposes  hazard  pointers  to  enable  the  safe 
reclamation  of  memory  while  maintaining  the  lock-freedom 
of  non-blocking  data  structures.  The  idea  is  to  introduce  a 
global  array  that  contains  for  each  thread  a  number  of  hazard 
pointers^  to  data  nodes  that  are  currently  in  use  by  the  thread. 
Additionally,  each  thread  stores  a  local  list  of  pointers  that  it 
wants  to  remove  from  the  shared  data  structure  (for  instance 
by  using  pop  in  the  case  of  a  stack).  After  each  successful 
removal  of  a  node  a  thread  checks  if  this  local  list  has  reached 
a  fixed  lengfh  fhreshold.  If  so,  if  checks  fhe  hazard  pointers 
of  each  other  thread  to  ensure  that  the  pointers  are  not  in  use 
before  reclaiming  the  space. 

The  use  of  hazard  pointers  does  not  affect  the  global  resource 
invariants  that  we  use  in  our  quantitative  verification  technique. 
The  reason  is  that  hazard  pointers  only  affect  parts  of  the 
operations  that  are  outside  the  loops  that  are  guarded  by  CAS 
operations.  Moreover,  the  worst-case  number  of  loop  iterations 
in  this  additional  code  can  be  easily  determined:  It  is  the 
maximal  length  I  of  the  local  list  multiplied  with  the  maximal 
number  of  threads  in  the  system. 

For  Treiber’s  stack  with  hazard  pointers,  the  specifications 
of  push  and  pop  are: 

I  h  ['jritid,  _)  *  0”]  push(v)  [yritid,  _)] 

/  h  [7r(ff(i,_,_)  *  0”+^^*”^]  popO  [yr{tid,_,_)] 

Where  7^  is  defined  as  in  (1).  The  resource  invarianf  /  is 
the  same  as  in  the  specification  of  the  version  without  hazard 
pointers. 

Elimination  Backoff:  To  improve  the  performance  of 
Treiber’s  non-blocking  stack  in  the  presence  of  high  contention, 
one  can  use  an  elimination  backoff  scheme  [9].  The  idea  is 
based  on  the  observation  that  a  push  operation  followed  by  a 
pop  results  in  a  stack  that  is  identical  to  the  initial  stack.  In  this 
case,  the  two  operations  can  be  eliminated  without  accessing 
the  stack  at  all:  The  two  threads  use  a  different  shared-memory 


^In  most  cases,  this  set  is  just  a  singieton. 
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cell  to  transfer  the  stack  element. 

Our  method  can  also  be  used  to  prove  that  Hendler  et  al’s 
elimination-backoff  stack  [9]  is  lock-free.  The  main  challenge 
in  the  proof  is  that  the  push  and  pop  operations  consist  of  two 
nested  loops  that  are  guarded  by  CAS  operations.  Assume  again 
a  system  with  n  threads.  The  inner  loop  can  be  just  treated  as  in 
Treiber’s  stack  using  n  tokens  in  the  precondition  and  0  tokens 
in  the  postcondition.  As  a  result,  the  number  of  tokens  needed 
for  an  iteration  of  the  outer  loop  is  n  +  1.  That  means  that  a 
successful  thread  needs  to  transfer  (n  —  1)  •  (n  -I-  1)  =  —  1 

tokens  to  the  other  threads  to  account  for  additional  loop 
iterations  in  the  other  threads.  Given  this,  we  can  verify  the 
elimination-backoff  stack  using  tokens  in  the  precondition. 

More  details  on  the  verification  can  be  found  in  Appendix  IV. 

Non-Blocking  Maps  and  Sets:  Quantitative  compensation 
schemes  can  also  be  used  to  prove  the  lock-freedom  of  non- 
blocking  maps  and  sets  (e.g.,  [29],  [30]). 

As  in  other  lock-free  data  structures,  interference  in  the  map 
and  set  operations  is  only  caused  if  the  operation  of  another 
thread  makes  progress.  For  example,  in  the  case  of  Harris’  non- 
blocking  linked  list  [29],  a  thread  will  only  make  an  additional 
traversal  (of  the  list)  if  there  is  interference  caused  by  another 
thread  that  makes  a  successful  traversal.  The  number  of  these 
additional  unsuccessful  traversals  can  be  bounded  using  the 
same  quantitative  compensation  scheme  as  in  our  previous 
examples. 

The  number  of  loop  iterations  within  each  list  traversal 
depends  however  on  the  length  of  the  list.  Nevertheless, 
it  is  possible  to  prove  an  upper  bound  on  the  number  of 
loop  iterations  executed  by  programs  in  V.  The  reason  is 
that  each  of  the  n  threads  executes  a  fixed  number  rrii  of 
operations.  Thus  the  total  number  of  operations  is  bounded 
by  m  =  many  important  shared-memory 

data  structures,  such  as  lists  or  maps,  m  (or  a  function  of 
m)  constitutes  an  upper  bound  on  the  size  of  the  shared  data 
structure.  One  can  then  use  this  bound  to  prove  an  upper  bound 
on  the  number  of  loop  iterations  by  introducing  0™  in  the 
global  resource  invariant.  Like  Atkey  [19]  we  can  use  ideas 
from  amortized  resource  analysis  [31]  to  deal  with  variable- 
size  data  structures.  By  assigning  tokens  to  each  element  of 
a  data  structure  we  derive  bounds  that  depend  on  the  size  of 
the  data  structure  without  explicitly  referring  to  its  size.  For 
instance,  an  inductive  list-predicate  that  states  that  k-  \^\  tokens 
are  available,  where  f  is  the  list  pointed  to  by  u  can  be  defined 
as  follows. 

LSeg'{x,  y,  k)  -^(x  =  y  A  emp)V 

(3w,  2  xi-^v*x-l-li->-z*  LSeg'  {z,  y,  k)  *  0*^) 

VIII.  Related  Work 

There  is  a  large  body  of  research  on  verifying  safety  proper¬ 
ties  and  partial  correctness  of  non-blocking  data  structures.  See 
for  instance  [18],  [32],  [33]  and  the  references  therein.  This 
work  deals  however  with  the  verification  of  the  complementary 
liveness  property  of  being  lock-free,  which  in  comparison  has 
received  little  attention. 


Colvin  and  Dongol  [10],  [34]  use  manually-derived  global 
well-founded  orderings  and  temporal  logic  to  prove  the  lock- 
freedom  of  Treiber’s  stack  [6],  Michael  and  Scott’s  queue  [7], 
and  a  bounded  array-based  queue.  Their  technique  is  not 
modular  but  rather  a  whole  program  analysis  of  the  most 
general  client  of  the  data  structure.  It  is  unclear  whether  the 
approach  applies  to  data-structure  operations  with  nested  loops. 
In  contrast,  our  method  is  modular,  can  deal  with  nested  loops, 
and  does  not  require  temporal  logic. 

Petrank  et  al.  [11]  attempt  to  reduce  lock-freedom  to  a  safety 
property  by  introducing  the  more  restrictive  concept  of  bounded 
lock-freedom.  It  states  that,  in  a  concurrent  program,  there  has 
to  be  progress  after  at  most  k  steps,  where  k  can  depend  on 
the  input  size  of  the  program  but  not  on  the  number  of  threads 
in  the  system.  They  verify  bounded  lock-freedom  with  a  whole 
program  analysis  using  temporal  logic  and  the  model  checker 
Chess.  The  technique  is  demonstrated  by  verifying  a  simple 
concurrent  program  that  uses  Treiber’s  stack.  Our  compensation- 
based  quantitative  reasoning  does  not  provide  such  an  explicit 
bound  on  the  steps  between  successful  operations  but  rather  a 
global  bound  on  the  number  of  loop  iterations  in  the  system. 
Additionally,  our  bound  depends  on  the  number  of  threads 
in  the  system  and  not  on  the  size  of  the  input.  A  conceptual 
difference  of  our  work  is  that  we  prove  the  lock-freedom  of  a 
given  data  structure  as  opposed  to  the  verification  of  a  specific 
program.  Moreover,  our  proofs  are  local  and  modular,  and  not  a 
whole  program  analysis.  We  also  show  that  compensation-based 
reasoning  works  for  many  advanced  lock-free  data  structures. 

Gotsman  et  al.  [13]  reduce  lock-freedom  proofs  to  termina¬ 
tion  proofs  of  programs  that  execute  n  single  data  structure 
operations  in  parallel.  They  then  prove  termination  using 
separation  logic  and  temporal  rely-guarantee  reasoning  by 
layering  liveness  reasoning  on  top  of  a  circular  safety  proof. 
Using  several  tools  and  manually  formulating  appropriate 
proof  obligations,  they  are  able  to  automatically  verify  the 
lock-freedom  of  involved  algorithms  such  as  Hendler  et  al.’s 
non-blocking  stack  with  elimination  backoff  [9].  While  these 
automation  results  are  very  impressive,  the  used  reduction 
to  termination  is  not  intended  to  be  applied  to  shared  data 
structures  that  use  thread  IDs  or  other  system  information  (see 
§II  for  details).  In  comparison,  our  compensation  reasoning 
does  not  restrict  the  use  of  thread  IDs  or  other  system 
information.  However,  the  termination  proofs  of  [13]  would 
also  work  for  a  modification  of  the  reduction  that  we  introduced 
in  this  paper. 

Tofan  et  al.  [12]  describe  a  fully -mechanized  technique  based 
on  temporal  logic  and  rely-guarantee  reasoning  that  is  similar 
to  the  work  of  Gotsman  et  al.  However,  they  assume  weak 
fairness  of  the  scheduler  while  we  do  not  pose  any  restriction 
on  the  scheduler.  Kobayashi  and  Sangiorgi  [35]  propose  a  type 
system  that  proves  lock-freedom  for  programs  written  in  the 
TT-calculus.  The  target  language  and  examples  seem  however 
to  be  quite  different  from  the  programs  we  prove  lock-free  in 
this  article. 
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IX.  Conclusion 

We  have  shown  that  lock-freedom  proofs  of  shared-memory 
data  structures  can  be  reduced  to  safety  proofs  in  concurrent 
separation  logic  (CSL).  To  this  end,  we  proposed  a  novel 
quantitative  compensation  scheme  which  can  be  formalized 
in  CSL  using  a  predicate  0  for  affine  tokens.  While  similar 
logics  have  been  used  to  verify  the  resource  consumption  of 
sequential  programs  [19],  this  is  the  first  time  that  a  quantitative 
reasoning  method  has  been  used  to  verify  liveness  properties 
of  concurrent  programs. 

In  the  future,  we  plan  to  investigate  the  extent  to  which 
quantitative  reasoning  can  be  applied  to  other  liveness  proper¬ 
ties  of  concurrent  programs.  The  quantitative  verification  of 
wait-freedom  seems  to  be  similar  to  the  verification  of  lock- 
freedom  if  we  require  that  tokens  cannot  be  transferred  among 
the  threads.  Obstruction-freedom  might  require  the  creation  of 
tokens  in  case  of  a  conflict.  We  also  plan  to  adapt  our  method 
to  locking  data  structures,  such  as  fairness  and  starvation- 
freedom.  These  properties  are  more  challenging  to  verify  with 
our  quantitative  method  since  they  rely  on  a  fair  scheduler, 
whereas  non-blocking  algorithms  do  not.  To  enable  such  proofs, 
we  plan  to  extend  our  compensation  scheme  to  include  the 
behavior  of  the  scheduler. 

Ultimately,  we  envision  integrating  our  compensation-based 
proofs  into  a  logic  for  termination-sensitive  contextual  re¬ 
finement.  We  are  currently  developing  such  a  logic  but  its 
description  is  beyond  the  scope  of  this  work. 
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Appendix 

I.  Further  Preliminary  Explanations 

Permissions:  It  is  sometimes  necessary  to  share  informa¬ 
tion  in  the  form  of  a  predicate  between  the  invariant  and  a 
local  assertion.  This  can  be  achieved  in  CSL  by  the  use  of 
permissions  [21]. 

The  predicate  E  >—>■  F  expresses  that  the  heap  location 
denoted  by  E  contains  the  value  that  F  denotes.  Another 
natural  reading  of  the  predicate  in  the  context  of  separation 
logic  is  that  E  >—>■  F  grants  the  permissions  of  reading  from 
and  writing  to  the  heap  location  denoted  by  E  (permission 
reading).  Building  upon  this  interpretation,  read  permissions 
state  that  a  full  read/write  permission  E  >—>■  F  can  be  shared 
by  two  threads  if  the  heap  location  denoted  by  E  will  not  be 
modified.  A  full  permission  and  two  read  permissions  can  be 
interchanged  using  the  following  equivalence."^ 

E^F  ^EP^  F*E^  F 

The  two  read  permissions  can  then  be  shared  between  two 
threads.  To  write  into  a  location,  a  thread  needs  a  full 
permission  and  to  read  a  location  it  only  needs  a  read 
permission. 

[x  _]  [x]  :=  E[x^  E]  (Write) 

[E  F]x  :=  [E][E  F  ^  x=F]  (Read) 

The  remaining  rules  of  the  concurrent  separation  logic  can 
remain  unchanged  in  the  presence  of  permissions. 

Auxiliary  Variables:  If  the  mles  of  (concurrent)  separation 
logic  are  not  sufficient  to  prove  a  property  of  a  program  then 
we  sometimes  have  to  use  auxiliary  variables  [20].  These 
are  variables  that  we  add  to  the  program  to  monitor  but  not 
influence  the  computation  of  the  original  program.  Thus,  if  we 
prove  a  property  about  a  program  using  auxiliary  variables  then 
this  property  also  holds  for  the  program  without  the  auxiliary 
variables. 

More  formally,  we  say  a  set  Aux  of  variables  is  auxiliary 
for  a  program  P  if  the  following  holds.  If  a;  :=  i?  is  an 
assignment  in  P  and  E  contains  a  variable  in  Aux  then  x  G 
Aux.  Additionally,  auxiliary  variables  must  not  occur  in  loop 
or  conditional  tests. 

II.  Formal  Development  and  Soundness 

In  the  following,  we  give  the  formalization  and  soundness 
proof  of  our  quantitative  concurrent  separation  logic  for  total 
correctness.  The  proof  is  inspired  by  Vafeiadis’  soundness 
proof  [26]  of  concurrent  separation  logic  and  Atkey’s  soundness 
proof  of  his  (sequential)  quantitative  separation  logic  [19].  How¬ 
ever,  we  not  only  prove  memory  safety  but  also  termination. 

First,  we  address  the  syntax  and  semantics  of  our  language 
and  logic  in  detail.  See  Figure  8  for  the  full  operational 
semantics  of  our  language  and  Figure  9  for  the  Hoare-style 
derivation  rules  of  the  logic.  The  semantics  are  standard  with 

“^A  read  permission  is  equivalent  to  a  fractional  permission  with  the  fraction 
0.5. 
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E:-.=  x\  n\E  +  E\E-E\... 

B  ■.■.=  E  =  E\  E  <E\^B\B\/  B\... 

C  ::=  skip  \  x  ■.=  E  \  x  :=  [E]\[E]  ■.=  E  \  x  :=  alloc(n) 

I  dispose{E)  \  C;C  \  C  \\  C  \  if  B  then  C  else  C 
I  while  B  do  C  \  atomic  C  \  {C} 

Fig.  6.  A  basic  while  language  with  concurrency  and  dynamic  allocation. 


the  exception  of  the  While-Loop,  While-Skip,  and  While- 
Abort  rules,  which  deal  with  safe  and  unsafe  loops  in  a 
program.  Similarly,  the  derivation  rules  include  an  extended 
While  rule  that  provides  a  logical  specification  that  ensures 
that  while  loops  are  terminating. 

Language:  We  use  a  basic  while  language  with  concur¬ 
rency  as  commonly  used  in  the  context  of  concurrent  separation 
logic  [36],  [14],  [26].  As  defined  in  Figure  6,  it  is  built  from 
integer  expressions  E,  boolean  expressions  B,  and  commands 
C.  As  in  Parkinson  et  al.  [18],  we  assume  a  global  shared 
heap  region.  An  extension  to  conditional  critical  regions  [14] 
is  possible  but  omitted  in  favor  of  clarity.  We  assume  that  each 
built-in  function  terminates.  For  simplicity,  we  do  not  include 
procedure  calls  in  the  language.  This  is  an  orthogonal  issue 
that  is  dealt  with  elsewhere  [27]. 

Semantics:  Formulas  and  programs  are  interpreted  with 
respect  to  a  program  state  using  a  small-step  operational 
semantics.  Since  the  logic  includes  a  consumable  resource 
predicate,  a  program  state  consists  not  only  of  a  heap  and 
a  stack  but  also  of  a  natural  number  t  which  represents  the 
number  of  consumable  resources  that  are  currently  available 
to  the  program.  To  execute  the  body  of  a  while  loop  there 
has  to  be  at  least  1  resource  available,  that  is  f  >  0.  After  the 
execution  of  the  loop  body,  there  are  f  —  1  resources  left. 

Let  Stack  =  Var  — >  Val  be  the  set  of  stacks  and  Heap  = 
Loc  —^fin  Val  be  the  set  of  heaps.  Then,  the  set  of  program 
states  is  State  =  Heap  x  Stack  x  N.  The  last  component 
describes  the  number  of  available  tokens. 

The  rules  of  the  semantics  are  defined  in  Figure  8.  They 
define  an  evaluation  judgment  of  the  forms 

C,  (T  — >■  C",  cr'  or  C,  CT— ^-L 

where  C  and  C  are  commands  and  a,  a'  G  State.  Intuitively, 
this  judgment  states  the  following.  If  we  execute  the  command 
C  in  the  state  a  then  the  next  computational  step  results  in  an 
error  (C,  a  — >  _L),  or  it  transforms  the  program  state  to  a'  and 
execution  continues  with  command  C .  A  deviation  from  the 
standard  rules  is  in  the  semantics  for  a  while  loop.  They  ensure 
that  a  token  is  consumed  if  the  body  of  the  loop  is  executed.  If 
the  loop  condition  is  satisfied  and  no  token  is  available  (t  =  0) 
in  the  current  state,  then  the  result  is  an  error. 

An  interesting  feature  of  our  semantics  is  that  it  does 
not  admit  infinite  chains  of  execution  steps.  We  prove  this 
by  defining  a  well-founded  order  ^  on  program  states  and 
commands.  To  this  end,  we  first  define  the  size  1671  of  a 
command  67  as  follows. 


Definition  1  (Size  of  Commands). 
is  inductively  defined  as  follows. 

\skip\  = 
\Ci;C2\  = 

|C^l||C^2|  = 

I  if  B  then  Ci  else  672 1  = 

I  while  B  do  C I  = 
\atomic  67 1  = 

|{C^}I  = 
|C|  = 


Let  C  be  a  command. 
0 

16711  +  16721  +  1 
|6:'i|  +  |672|  +  1 
max(|67i|,  |672|)  +  1 
1^1  + 1 
|67|  +  1 
|67| 

1  otherwise 


|67| 


Definition  2  (Well-Founded  Order).  Let  a  =  (if,  V,  t),  a'  = 
(if',  V'  ,t')  be  program  states  and  let  67,  67'  be  commands.  We 
define  (67',  cr')  +  (67,  cr)  iff  t'  <  t  or  (t'  =  t  and  |67'|  <  |67|). 


Proposition  1.  The  relation  +  is  a  well-founded  order  on 
program  states. 

Lemma  1.  If  C,a  ^  C ,a'  then  (67',  ct')  +  (67,  cr). 


Proof.  By  inspection  of  the  operational  semantics  rules.  □ 


As  a  consequence  of  Lemma  1  and  the  well-foundedness  of 
<  on  the  natural  numbers,  there  are  no  infinite  chains  of  the 
form  67i ,  CTi  — >  672 ,  (72  — 7-  •  •  • . 

Theorem  2.  There  exist  no  infinite  chains  of  the  form  Ci,ai  — > 
672 ,  (72  — ^  •  •  • . 

Definition  3.  For  a  program  state  a  and  a  command  67  we 
write  67,  (7  Ij.  cr'  if  67,  a  — )■*  skip,  a' .  Similarly,  we  write  67,  ct  jj. 
+  ifC,a-^*  +. 

An  inspection  of  the  mles  of  the  operational  semantics  shows 
that  each  terminal  state  has  the  form  skip,  a. 

Definition  4  (Termination).  We  say  that  a  program  67  termi¬ 
nates  from  an  initial  state  a  if  not  67,  ct  jj.  +. 

Because  our  semantics  do  not  allow  infinite  evaluation  chains, 
we  relate  this  definition  to  a  usual  small-step  semantics  without 
tokens.  Since  this  relation  is  not  important  for  the  formal 
development  we  keep  the  discussion  short.  This  semantic 
judgement  is  of  the  form  C,t  =>  C' ,t'  or  67,  t  +>  +,  where 
67, 67'  are  commands  and  r,  r'  €  Heap  x  Stack.  The  mles  of  the 
semantics  are  identical  to  the  mles  of  our  quantitative  semantics 
with  the  token  component  removed.  The  only  exceptions  are 
the  rules  for  while  loops  which  are  replaced  by  the  following 
rules. 

_ _ Li _ L _  tW-Loopl 

{ while  B  do  67} ,  (if,  F)  ^  { 67;  while  B  do  C},{H,V) 


_ _ Li _ i _  ("W-SkipI 

{while  B  do  C} ,  T  ^  skip,  t 

Theorem  3.  Let  C  be  a  command  and  let  a  =  (if,  V,  t) 
by  a  state.  If  not  67,  ct  jj-  +  then  there  is  no  infinite  chain 
of  the  form  C,{H,V)  ^  67i,ri  ^  672,  T2  +>  •••  and  not 
67,  (if,  V)  +. 
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(T|=0"S^i>OA  dom{H)  =  0 
a  \=  P  *  Q  ^  3Hi,  H2,  ti,t2-  H  =  Hi  (B  H2A 
t  =  ti  +  t2  A  {Hi,V,ti)  1=  FA 
{H2,V,t2)hQ 

WH',  t'.if  H®H'  defined  A  {H' ,  V,  t')  [=  P 
then  {H  (BH',V,t  +  t')  \=  Q 
a  \=  dom(F)  =  |F](V^) 

AF([Fl(y))  =  ([Fl(^),T) 

(7  ^  F  F  AA  dom(F)  =  |F](V") 

AHm{V))  =  m{V),r) 

Fig.  7.  A  sample  of  the  semantics  of  assertions  over  a  state  a  =  (H,  V,  t). 
The  semantics  of  the  other  connectives  and  predicates  are  standard. 

To  prove  the  theorem,  we  first  prove  for  every  t  G  N  and 
every  program  state  r  =  {H,  V)  that  if  C,  r  ^  C ,  {H' ,  V) 
then  either  C,{H,V,t)  — )■  C ,{H' ,V' ,t')  for  some  t'  or 
C,  (F,  V,  t)  —>■  F.  This  follows  immediately  by  an  inspection 
of  the  mles.  The  only  interesting  case  is  the  treatment  of  while 
loops  for  which  the  property  is  easily  verified. 

Given  this,  we  see  that  the  notion  of  termination  C,a  ij-  a' 
corresponds  exactly  to  the  standard  notion  of  termination  under 
a  semantics  without  a  resource  component. 

Concurrent  Separation  Logic  with  Quantitative  Reasoning: 
Following  the  presentation  of  Atkey  [19],  we  define  the 
predicates  of  quantitative  separation  logic  as  follows.  Since  we 
only  deal  with  one  resource  at  a  time  we  write  0  instead  of 
Atkey’ s  R. 

P  ::=  B  \  P\/  P  \  P  AP  \^P  \  P  =>  P  \  Wx.P  \  3x.P 

\<}\emp\E^E\E^^E\P*P\P^Q  \  ®  P 

iGl 

Following  previous  work  [21],  [26],  we  model  assertions  in  the 
logic  with  permission  heaps.  Heap  locations  are  instrumented 
with  a  permission  in  {r,  T }  where  r  is  read-only  and  T  is 
full  permission.  Permission  heaps  can  be  added  using  the  0 
operator,  which  adds  permissions  where  they  overlap  (and 
are  both  r),  and  takes  the  disjoint  union  elsewhere.  The 
operational  semantics  is  independent  of  the  permissions.  So 
we  define  it  for  heaps  without  permissions,  which  can  be 
derived  from  permission  heaps  by  deleting  the  permission 
component.  Figure  7  contains  the  semantics  of  the  most 
interesting  connectives  and  predicates. 

The  rules  of  the  program  logic  are  given  in  Figure  9 
Soundness:  In  keeping  with  the  presentation  given  in  [26], 
we  define  satisfaction  of  Hoare  triples  according  to  the 
inductively  defined  predicate  safe„ (C,  cr, /,  Q)  which  states 
that  command  C  will  execute  safely  for  up  to  n  steps  starting 
in  state  a  under  resource  invariant  I  and  if  it  terminates,  the 
resulting  state  will  satisfy  Q. 


Definition  5  (Safety).  For  any  state  a  =  {H,  V,  t),  command 
C,  and  predicates  I,  Q: 

•  saf eQ{C,  a,  I,  Q)  holds. 

•  safe„+i(C,  CT, /,  Q)  holds  when  all  of  the  following  are 
true: 

Y)  If  C  =  skip  then  a  \=  Q. 

2)  For  all  tj  €  N  and  all  Hj,Hp  €  Heap  such  that 
{Hj,V,ti)  ]=  I  and  H  0  Hj  (B  Hp  is  defined, 
C,  (F  0  F/  0  Hp,  V,  f  0  F)  74  0. 

3)  For  all  ti,f  €  N,  Hi,Hp,H'  €  Heap,  and  V  G 
Stack  such  that  {Hi,V,ti)  ^  I  and  H  0  Hj  0 
Hp  is  defined,  if  C,  (F  0  Hi  (B  Hp,V,t  +  tj)  — )■ 
C' ,  (F',  V ,  t'),  then  there  exist  H" ,  Hj  and  t"  such 
that  H'  =  F"0Fj0FF,  t”  <  f,  {H'l,  V' ,f')  ^  I 
and  safe„(C",  (F",  V ,  f  —  t"),  I,  Q). 

When  n  >  0,  the  first  condition  specifies  that  if  the  execution 
is  in  a  terminal  state,  then  that  state  satisfies  the  postcondition  Q. 
The  second  condition  states  that  the  execution  will  not  go  wrong. 
The  third  condition  ensures  that  each  step  preserves  the  resource 
invariant  I,  and  that  after  executing  one  step,  the  resulting 
program  is  safe  for  another  n—1  steps.  In  the  second  and  third 
conditions,  Hj  and  tj  represent  the  resources  required  to  satisfy 
the  global  invariant  I.  Hp  represents  additional  heap  cells 
which  may  be  needed  by  other  parts  of  the  program.  Note  that 
we  do  not  include  a  frame  tp  of  consumable  resources.  Since 
predicate  satisfaction  is  monotonic  with  respect  to  consumable 
resources,  we  do  not  need  to  distinguish  between  consumable 
resources  in  the  shared  region  (tj)  and  those  in  the  frame. 
Also,  since  the  operational  semantics  only  work  on  concrete 
heaps,  in  condition  (3)  Hp  will  necessarily  contain  any  heap 
locations  that  an  executing  thread  has  read  permission  to.  By 
using  the  same  Hp  before  and  after  an  execution  step  we  thus 
ensure  that  a  thread  cannot  modify  a  heap  location  unless  it 
has  full  permission  at  that  location. 

Given  this,  we  say  that  a  Hoare  triple  [F]  C  [Q]  is  satisfiable 
under  an  invariant  I  if  and  only  if  for  all  n  €  N  and 
all  states  ct  ^  F,  safe„(C',  cr, /,  Q)  holds.  For  a  discussion 
of  the  motivations  behind  this  particular  characterisation  of 
satisfaction,  see  [26]. 

Before  we  present  the  proof  of  soundness  of  the  logic,  we 
need  to  consider  two  aspects  of  the  logic  and  how  they  interact 
with  consumable  resources:  Permission  Heaps  [21]  and  Precise 
Assertions  [14]. 

Permission  Heaps:  Let  Perm  =  {r,  T}  be  a  permissions 
set,  with  r  indicating  read-only  permission  and  T  indicating 
full  permission.  Then,  let  PHeap  =  Loc  -A fin  Val  x  Perm  be 
the  set  of  permission  heaps.  A  permission  heap  F  G  PHeap 
is  a  finite  mapping  from  locations  to  pairs  of  values  and 
permissions.  Perm  is  equipped  with  a  commutative  partial 
operator  0  defined  as  r  0  r  =  T,  and  undefined  otherwise. 

We  extend  the  permission  operator  0  to  value-permission 
pairs  as  follows: 

{V1,P1)®{V2,P2)  =  . 

I undehned 


if  vi  =  V2  and  pi  =  P2  =  r 
otherwise 
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(Assign) 

(Lookup-Abort) 


{x  :=  E},{H,V,t)  -5>  skip, 

lEUV)^dom{H) 

{x  :=  [E]},{H,V,t)  -5>  ± 

lEJiV)  ^  dom{H) 

'  7 - - (Mutate-Abort) 

{[E]  :=  E},{H,V,t)  ^  ±  '  ' 

£=IEI{V)  e£dom{H) 


e=lEl{V)  e&dom{H) 

{x  :=  [E]},  (H,  V,t)  ^  skip, 

£=IEI{V)  ££dom{H) 

{[E]  :=  E},  {H,  V,  t)  ^  skip,  t) 


Vi  S  {0, . . .  ,n  —  1\  .  £  +  i  ^  dom(H) 

{x  :=  aiioc{n)},  {H,  V,t)  -5>  skip,  (-ff  U+o,...,<+„-i=o>  ^  U=i!,  i) 


(Lookup) 

(Mutate) 

(Alloc) 


dispose{E) ,  (H,  V,  t)  skip,  {H\i,  V,  t) 

7  (SEQl) 


(Dispose) 


IE}{V)  ^  domjH) 
dispose{E),  (H,  V,  t)  — i 


(Dispose-Abort) 


C'i,cr  C[,(t' 


{Ci-,C2},a  ^  {C[-,C2},a 

7  (PARI) 


(SEQ2) 


C'i,cr  ± 


C'l,<T  C[,Ct' 


{Cl  II  C2},<T^  {C(  II  C2},V 

Ci,o-^>  ± 


(sJcip;  C2},  IT  — 7  C2,  O'  {Ci;C2},iT^± 

7  (Par2) 


C2,(T  ^  C^,<t' 


(Cl  II  C2},f7^± 


(Par-AbortI) 


4B}{V) 


(Cl  II  C2},(7^  (Cl  II  c^},v 
C2,o-  ± 


{ifB  then  Ct  eise  Cf},  {H,  V,  t)  ^  Cf,  {H,  V,  t) 

ABUV) 


(Cl  II  C2},IT^± 

(If-False) 


(Par-Abort2) 


(sicip  II  skip},  a  —7  skip,  a 

[51  (V) 


(Seq-Abort) 

(Par3) 

(If-True) 


{ifB  then  Ct  eise  Cf},  {H,  V,  t)  ^  Ct,  (H,  V,  t) 

[bKL)  t>o 


(wJiiie  B  do  C},  {H,  V,  t)  (C;  whiie  B  doC},  {H,  V,t-1) 


(While-Loop) 


{  whiie  B  do  C} ,  a  —7  skip,  cr 


(While-Skip) 


[b1(v)  t  =  o 

{  whiie  B  do  C},a  —t 
C,u  . 


(While-Abort) 


C,  CT  — 7*  skip,  o' 


{atomic  C},o  —t  skip,  o' 


- — f  (Atom) 


(Atom-Abort) 


{atomic  C},  cr  - 
Fig.  8.  Small-step  operational  semantics 


(Skip) 


X  ^  fv(/) 


I  \-  [P]  skip  [P] 

7  h  [P  1-7  _]  \E]  :=  P  [P  1-7  F] 


I  h  [P[P/a;]]  x:=E  [P] 

(Mutate) 


(Assign) 


a;  ^  fv(7,  P,  P) 


7  h  [P  P]  a;  ;=  [P]  [P  P  A  a:  =  P] 


(Lookup) 


X  ^  fv(7) 


7  h  [P  I— 7  _]  dispose{E)  [emp\ 


(Dispose) 


7  h  [emp]  X  ~  alloc{n)  [a;-l-0i-70A...Aa;-l-n— I1-7O] 
7  h  [P]  Cl  [Q]  7  h  [Q]  C2  [P] 


(Alloc) 


7  F  [Pi]  Cl  [Qi]  7  h  [P2]  C2  [Q2] 
fv(7,  Pi,  Cl,  Qi)  n  wr(C2)  =  0  fv(7,  P2,  C2,  Q2)  n  wr(Ci)  = 

7  h  [Pi  *  P2]  Cl  II  C2  [Qi  *  Q2] 

PAP=V>P'*0  71-  [P'j  C  [P] 


(Par) 


7h  [P[Ci;C2[P] 

7  h  [P  A  P[  Ct  [Q]  7  h  [P  A  ^P]  Cf  [Q[ 

7  h  [P]  ifB  then  Ct  else  Cf  [Q[ 


(If) 


7  h  [P[  while  B  do  C  [P  A  -<B] 

7  h  [P]  C  [Q]  fv(P)  n  wr(C)  =  0 
7  h  [P  *  P[  C  [Q  *  P] 

7  h  [Pi]  C[Q]  IV-  [P2[  C  [Q] 


(While) 


emp  h  [P  *  7]  C  [Q  *  7] 


(Atom) 


(Frame) 


7  F  [P]  atomic  C  [Q] 

7  F  [P[  C  [Q]  P' 


7  *  J  F  [P]  C  [Q] 
I\-[P*J]C[Q*J] 

Q  ^  Q' 


(Share) 


(Disjunction) 


7F[P']C[g'] 

7  F  [P]  C  [Q]  X  ^  fv(C) 


(Consequence) 


(Existential) 


7  F  [Pi  V  Pa]  C  [g]  ^  '  7F  [3a:.P]C[3x.g[ 

7  F  [P[  C  [gi]  7  F  [P]  C  [ga]  7  precise 

- ,  r  ^  - (Conjunction) 

7  F  [P]  C  [gi  A  ga] 

Fig.  9.  Derivation  rules,  fv  gives  the  set  of  free  variables  in  a  command  or  predicate,  wr  gives  the  set  of  variables  which  are  modified  by  a  command. 
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We  further  extend  0  to  permission  heaps  Hi  and  H2  that  agree 
on  the  values  at  overlapping  locations: 

( Hi{£)  (B  H2{£)  if  f' e  dom(i/i)  n  dom(iJ2) 

{Hi®H2){£)  =  <  Hi{£)  if  f'  e  dom(_ffi)  \  dom(i72) 

[  H2  {£)  otherwise 

Given  this,  we  model  assertions  in  the  logic  with  permission 
heaps  (see  Figure  7).  As  in  Vafeiadis  [26],  assertions  are 
modeled  with  permission  heaps  but  the  operational  semantics 
act  on  concrete  heaps.  To  reconcile  this,  we  consider  regular 
heaps  as  a  subset  of  permission  heaps  where  the  permission  is 
always  T 

Heap  =  Loc  -^fin  Val  x  {T} 

Then,  for  any  permission  heap  H  there  exists  a  complementary 
permission  heap  H'  for  which  iT  0  77'  is  a  concrete  heap. 
Specifically,  H'  must  contain  the  sub-heap  of  H  that  includes 
all  the  locations  at  which  H  has  read  permission.  Define 
read{H)  to  be  such  a  sub-heap: 

read{H)  =  {{v,p)  \  {v,p)  G  H  Ap  =  r} 

Then, 

Vi7  €  PHeap.  H  0  read{H)  €  Heap 

Now  consider  Definition  5  of  safe„(C,  ct, /,  Q).  In  the  defini¬ 
tion,  every  time  the  small-step  judgement  — >  is  invoked,  the 
heap  is  H(BHj(BHp,  which  includes  the  universally  quantified 
Hp.  Thus,  Hp  will  always  include  read{H)(Bread{Hj).  This 
means  that  H  0  Hj  (B  Hp  is  a  concrete  heap,  so  the  definition 
makes  sense.  Furthermore,  since  Hp  is  not  modified  by  the 
step  in  condition  (3),  C  cannot  modify  locations  in  the  heap 
to  which  H  has  only  read  access. 

Precise  Assertions:  As  shown  in  [14],  [26],  in  order  for 
the  logic  to  be  sound,  we  require  that  the  global  resource 
invariant  be  precise  in  the  CONJUNCTION  rule.  We  define 
precise  assertions  [37],  [14],  [26]  as  follows.  An  assertion  P 
is  precise  when  it  is  satisfied  by  exactly  one  sub-heap  of  any 
heap. 

Definition  6  (Precise  Assertions).  Let  V  €  Staek,  t  G  N 
and  let  P  be  an  assertion.  P  is  precise  if  and  only  if  for  all 
Hi,  H2,  H'l,  H2  G  PHeap  such  that  Hi  0  H2  is  defined  and 
Hi®H2  =  H'i®  77',  if  {Hi,V,  t)  1=  P  and  {HfV,  t)  h  P, 
then  Hi  =  77j. 

This  definition  does  not  consider  our  consumable  resources. 
Since  the  assertion  is  satisfiable  by  any  set  of  at  least  k 
resources,  it  is  impossible  for  a  predicate  to  specify  an  exact 
set  of  resources.  Regardless,  since  the  tokens  are  affine  entities 
as  we  see  in  the  following  proof,  the  soundness  of  the  logic  is 
unaffected. 

Before  we  prove  the  soundness  of  the  rules,  we  have  to 
prove  two  additional  lemmas  that  are  needed  in  the  case  of 
the  rule  While. 

Lemma  2.  Let  a  =  (77,  V,  t)  be  a  program  state,  7,  Q,  R 
predicates,  Ci,C2  a  command,  and  n  a  natural  number.  If 
safe„ (Cl,  cr,  7,  Q)  and  for  all  m  <  n  and  a'  with  o'  |=  Q, 
safe™ (C2,  cr',  7, 7?)  then  safe„(Ci;  C2,  cr,  7, 77). 


The  proof  of  Lemma  2  is  identical  to  the  proof  of  the  same 
lemma  in  (the  formalized  proof  of)  [26]. 

Lemma  3.  Let  a  =  (77,  V,  t)  be  a  program  state, 

while  B  do  C  a  command,  and  7,  P,  P'  predicates  such  that 
P  A  B  P'  *  ().  If  [P'j  C  [P]  is  satisfiable  under  I  and 

a  \=  P  then  for  all  n,  safe„ (while  B  do  C,  a,I,P  A  ^B). 

Proof.  We  prove  the  lemma  by  induction  on  n.  The  base  case 
for  n  =  0  follows  directly  from  the  definition  of  safep. 

Assume  now  that  safe„(  while  B  do  C,  a,I,P  A  ^B)  holds. 
To  show  safe„+i (while  B  do  C,  a,I,P  A  ^B),  we  show  that 
all  three  conditions  in  Definition  5  are  satisfied: 

1)  We  have  while  B  do  C  f  skip,  so  this  condition  holds 
vacuously. 

2)  The  only  rule  in  the  operational  semantics  which  can 
derive  {while  B  do  C},  ct  — >  0  is  While- ABORT.  The 
premises  of  the  rule  are  |P](C)  and  7  =  0.  We  show  that 
from  |P](17)  it  follows  that  7  >  0.  Therefore  While- 
Abort  does  not  apply.  Assume  |P](y).  Since  a  \=  P 
we  have  then  cr  |=  P  A  P,  and  thus  it  follows  from  the 
premises  that  a  |=  P'  *  0-  From  the  semantics  of  <>  we 
derive  7  >  1.  This  confirms  condition  (2). 

3)  Let  77,7'  €  N,  HpHp  G  PHeap,  77'  €  Heap  and 
V  G  Staek  such  that  (77/,  V,  7/)  |=  7  and  H  (B  Hi  (B 
Hp  is  defined,  and  C,  (77  0  Hi  0  Hp,V,t  +  ti)  — > 
C',  (77',  V\  7').  Then  the  mles  While-Loop  or  While- 
Skip  have  been  applied.  Since  non  of  these  rules  modifies 
the  heap  (nor  stack),  77'  =  77077/077/?,  so  let  77"  =  77 
and  H'j  =  Hi . 

In  the  case  of  the  rule  While-LooP,  we  have  C'  = 
C;  while  B  do  C  and  7'  =  7  0  7/  —  1.  Let  now  7"  = 
7/.  Then  7  >  1  (premise  of  While-Loop)  and  we 
have  7  —  1  >  0,  so  7"  <  7'.  It  follows  by  construction, 
(77j,V",7")  =  (77/,  V,  7/),  which  satisfies  7.  Moreover 
(77",  y',  7'  -  7")  h  P'.  Since  [P']  C  [P]  is  satisfiable 
under  7,  we  have  safe„(C,  (77",  V",  7'  —  7"),7,P).  By 
induction  we  have  safem(wiiiie  B  do  C,  a' ,  7,  P  A  -^B) 
for  all  m  <  n  and  all  a'  with  a'  ^  P.  Therefore  we 
derive  safe„(C',  (77",  V' ,  7'  —  t"),I,  P)  with  Lemma  2. 
In  the  case  of  the  rule  While-Skip,  we  have  C  =  skip 
and  7'  =  707/.  Let  again  7"  =  tp  Then  7"  <  7'  and  it 
follows  that  (77},  y',  7")  =  [HpVfii)  satisfies  7.  Fur¬ 
thermore,  (77",  V'  ,t'  —  t”)  \=  P  and  from  the  premise  of 
the  While-Skip  we  obtain  (77",y',7'-7")  \=  PA^B. 
Thus  safe„(sJdp,  (77",  V' ,  7'— 7"),  7,  PA^B).  (Condition 
(1)  follows  from  the  aforesaid  and  Conditions  (2)  and 
(3)  by  inspection  of  the  evaluation  rules.) 

□ 

Theorem  4  (Partial  Correctness).  For  any  propositions  7,  P,  Q 
and  any  command  C,  if  I  \-  [P]  C  [Q],  then  [P]  C  [Q]  is 
satisfiable  under  I. 

Proof.  The  proof  is  by  stmctural  induction  over  the  derivation 
rules  given  in  Figure  9.  Since  the  only  command  which  accesses 
the  resource  component  of  program  state  is  the  while  loop,  the 
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proof  of  every  rule  is  essentially  the  same  as  in  Vafeiadis  [26] 
except  for  the  rule  While.  For  all  of  the  following,  let  cr  = 
(iJ,  V,  t)  €  State,  C  be  a  command,  and  I,  P,  Q  be  predicates. 

While:  Follows  directly  from  Lemma  3. 

Conjunction:  To  see  that  the  definition  of  precise  asser¬ 
tions  is  sufficient,  we  consider  the  CONJUNCTION  rule.  Let 
/  be  a  precise  assertion,  Qi,Q2  be  any  assertions,  and  let 
C  be  a  command.  We  show  by  induction  that  for  any  state 
<j  =  {H,V,t)  and  any  n  €  N,  if  safe„+i(C,  ct, /,  Qi)  and 
safe„+i(C',(j,  1,(52)  then  safe„+i ((7,  cr, /,  (5i  A  (52)-  Again, 
we  confirm  each  condition: 

1)  if  C  =  skip,  then  cr  |=  Qi  and  cr  |=  (52-  Thus,  cr  \= 
Qi  A  Q2- 

2)  Since  this  condition  does  not  depend  on  the  post¬ 
condition,  it  is  already  verified  by  fhe  assumption 
safe„+i((7,  a,I,Qi). 

3)  Lef  ti,t'  €  N,  Hi,Hp  €  PHeap,  H'  G  Heap  and 
V  G  Stack  such  that  (Hi,V,tj)  \=  I  and  H®Hj®Hp 
is  defined,  and  assume  that  C,  {H(BHj(BHp,  V,  t+tj)  — i 
C',{H',V',t'). 

By  our  assumption,  there  exist  H"^,  H'^  and  such 
that  H'  =  H''^®H'j^®Hp,  t"^  <  t' ,  t"^)  h  I 

and  safe„((7',  C',  f'  —  t"^),I,Qi).  Likewise  for 

iFf ,  and  Q2. 

This  implies  that  FF"^  0  H'^  =  FF"^  0  Hf.  Since 
F  is  precise,  we  know  that  and  thus 

iF"i  =  Finally,  let  t”  =  mm{t"^ ,t"'^).  Then,  t'-t” 
will  be  at  least  as  large  as  both  t'  —  t”^  and  t'  — 
and  will  thus  be  sufficient  to  ensure  that  both  Qi  and 
Q2  hold  if  the  execution  terminates.  We  conclude  that 
safe„(C',  t'  -  t"^),I,  Qi  A  Q2). 

□ 

The  total  correctness  of  the  logic  is  a  direct  consequence  of 
Theorem  4  and  Theorem  2. 

Theorem  5  (Total  Correctness).  Let  F,  P,  Q  be  propositions, 
C  be  a  command,  and  a  be  a  program  state.  If  a  P  *  I 
and  I  h  [P]  C  [(5]  then  every  evaluation  of  C  from  the  initial 
state  a  terminates  in  state  a'  with  a'  \=  Q  *  I. 

III.  Memory  Saeety  oe  Treiber’s  Stack 

To  additionally  verify  memory  safety,  we  have  to  add 
some  auxiliary  state  and  extend  our  resource  invariant.  The 
verification  is  then  similar  to  the  proofs  in  related  work  on 
verification  of  safety  properties  [32],  [33].  However,  there  are 
synergies  between  the  lock-freedom  and  the  memory  safety 
proof. 

See  Figure  10  for  the  full  implementation  of  Treiber’s  stack 
in  our  while  language.  The  crucial  point  in  the  verification 
of  memory  safety  are  the  assignments  x  :=  [t+1]  and  ret  val 
:=  [t]  in  the  method  pop.  Our  goal  is  to  ensure,  using  the 
resource  invariant,  that  these  locations  are  owned  by  the  shared 
region.  At  the  evaluation  of  each  assignment  there  are  two 
possible  cases:  Either  the  memory  location  that  is  read  is  still 
part  of  the  stack  S  or  it  has  been  removed  from  the  stack  by 
another  thread.  To  keep  track  of  the  memory  locations  that 


are  pointed  to  by  the  stack,  we  introduce  an  inductive  list 
predicate  to  describe  the  list  pointed  to  by  S.  To  keep  track 
of  the  locations  that  have  been  removed  from  the  stack  we 
introduce  an  auxiliary  variable  that  points  to  a  second  stack  G 
that  contains  all  the  locations  that  have  been  removed  from  S. 
To  this  end,  we  push  a  node  onto  G  after  it  is  removed  from 
S.  That  is,  we  replace  the  last  atomic  block  in  pop  with  the 
following  code. 

atomic  {  //  popped  :=  CAS(S,t,x) 

s  :=  [S]; 
if  s  ==  t  then  { 

[S]  :=  x; 

poppeti  :=  true; 

9  '=  [G] ;  //  push  t  onto  G 

[t+1]  :=  g; 

[G]  :=  t 
}  else  skip; 

C[tid]  :=  false 

} 

The  invariant  F  is  then  extended  as  follows  where  n  is  again 
the  total  number  of  threads  and  a{i,  u)  is  defined  as  before. 

I'  =  3m.  S' I— >  M  *  0  a{i,u)*Gi-^v 

0<i<n 

*{3u',v'  LSeg{u,u')  *  LSeg{v,v'))  A 
f\  /3{i,u,v) 

0<i<n 

fi{i,u,v)  =  3a,  c.  (7[z]  c  *  a 

*(c  =  0  V  LSeg{u,  a)  V  LSeg{v,  a)) 

The  inductive  list  predicate  LSeg  is  defined  as  usual  [22]  by 
LSeg(x,  y)  -^{x  =  y  /\  emp)\/ 

(3m,  z  x^v*x  +  1v^z*  LSeg{z,  y)) 

The  invariant  ensures  for  each  thread  which  is  in  the  critical 
section  that  the  local  variable  t  points  to  a  location  that  is  used 
by  the  lists  pointed  to  by  S  and  G.  Note  that  we  can  reuse  the 
axillary  arrays  A  and  G  in  the  formulas  P{i,u,v). 

IV.  VERIEICATION  OE  HENDLER  ET  AL’S 
Elimination-Backofe  Stack 

To  improve  the  performance  of  Treiber’s  non-blocking  stack 
in  the  presence  of  high  contention,  one  can  use  an  elimination 
backoff  scheme  [9].  The  idea  is  based  on  the  observation  that 
a  push  operation  followed  by  a  pop  results  in  a  stack  that 
is  identical  to  the  initial  stack.  So,  if  a  stack  operation  fails 
because  of  the  interference  of  another  thread  then  the  executing 
thread  does  not  immediately  retry  the  operation.  Instead,  it 
checks  if  there  is  another  thread  that  is  trying  to  perform 
a  complementary  operation.  In  this  case,  the  two  operations 
can  be  eliminated  without  accessing  the  stack  at  all:  The  two 
threads  use  a  different  shared-memory  cell  to  transfer  the  stack 
element. 

Our  method  can  also  be  used  to  prove  that  Hendler  et  al’s 
elimination-backoff  stack  [9]  is  lock-free.  The  main  challenge 
in  the  proof  is  that  the  push  and  pop  operations  consist  of  two 
nested  loops  that  are  guarded  by  CAS  operations.  Assume  again 
a  system  with  n  threads.  The  inner  loop  can  be  just  treated  as  in 
Treiber’s  stack  using  n  tokens  in  the  precondition  and  0  tokens 
in  the  postcondition.  As  a  result,  the  number  of  tokens  needed 
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S  :=  alloc  (1);  //  initialization 

[S]  :=  0; 

A  :=  alloc (max_tid) ;  //  auxiliary  arrays 
C  :=  alloc (max_tid) ;  //  Initialized  to  0 

p-ush(v)  ^ 

pushed  :=  false; 

X  :=  alloc (2) ; 

[x]  :=  v; 

while  (  Ipushed  )  do  { 
atomic  { 

t  :=  [S];  //  expect  t  =  [S] 

C[tid]  :=  1  //  critical  state  starts 

A[tid]  :=  t 
}; 

[x+1]  :=  t; 

atomic  {  //  pushed  :=  CAS(S,t,x) 

s  :=  [S]; 
if  s  ==  t  then  { 

[S]  :=  x; 

pushed  :=  true; 

}  else  skip; 

C[tid]  :=  0  //  critical  state  ends 

}; 

consume (1) 

} 


popO  = 

popped  :=  false; 
while  {  ! popped  )  do  { 
atomic  { 

t  :=  [S];  //  assume  t  =  [S] 

C[tid]  :=  1  //  critical  state  starts 

A[tid]  :=  t 
}; 

if  t  ==  0  then  {  //empty  stack 
ret_val  :=  0; 
popped  :=  true 
}  else  { 

X  =  [t+1]; 
ret_val  :=  [t]; 

atomic  {  //  popped  :=  CAS  (S, t, x) 

s  :=  [S]; 
if  s  ==  t  then  { 

[S]  :=  x; 

popped  :=  true; 

}  else  skip; 

C[tid]  :=  false  //  critical  state  ends 
}; 

consume (1) 

} 

}; 

return  :=  ret_val; 


Fig.  10.  A  full  implementation  of  Treiber’s  lock-free  stack  in  our  while  language. 


for  an  iteration  of  the  outer  loop  is  n  +  1.  That  means  that  a 
successful  thread  needs  to  transfer  (n  —  1)  •  (n  +  1)  =  —  1 

tokens  to  the  other  threads  to  account  for  additional  loop 
iterations  in  the  other  threads.  Given  this,  we  can  verify  the 
elimination-backoff  stack  using  tokens  in  the  precondition. 
Technically,  we  need  an  invariant  of  the  form  I  *  J,  where  I 
is  an  invariant  like  in  Treiber’s  stack  (for  the  inner  loop)  and 
J  is  like  I  but  with  every  token  0  replaced  by  0”. 


To  make  this  reasoning  more  concrete.  Figure  11  shows  the 
loop  structures  of  the  push  operation  of  Hendler  et  al’s  stack 
with  elimination  scheme  in  our  while  language.  The  auxiliary 
arrays  Al  and  Cl  have  the  same  purpose  as  in  Treiber’s  stack: 
Cl[tid]  indicates  if  thread  tid  is  making  an  assumption  on  the 
value  of  the  stack  pointer  S  and  Al[tid]  contains  the  value 
of  the  local  variable  t.  They  will  be  used  to  formulate  the 
part  of  the  global  invariant  that  is  crucial  to  maintain  the  loop 
invariant  of  the  outer  while  loop.  The  inner  while  loop  has  the 
same  structure  as  the  outer  loop  since  it  is  also  guarded  by  a 
CAS  operation.  However,  the  address  on  which  the  CAS  is 
performed  is  not  fixed.  Thus  we  need  three  additional  auxiliary 
arrays  to  formulate  part  of  the  global  invariant  that  is  needed  for 
the  inner  loop:  C2[tid]  indicates  whether  thread  tid  is  making 
an  assumption  on  the  shared  state  that  is  stored  in  otherT.  The 
array  B2  stores  the  memory  address  that  is  affected  by  this 
assumption  and  the  array  A2  stores  what  the  assumption  is. 


The  global  invariant  /  can  then  be  defined  as  follows. 

/  =  3u,vi, . . .  ,Vn- S  1-^  u  *  @ 

0<i<n 

*(  ®  col[j]i-^  _*  @  B2[i]dA_A 

0<j<m  0<i<n 

A 

0<z<n 

6{i,u)  =  3a,c.  Cl[i]  dA  c*  Al[i]  a*  {c=0\/ a=u\/ (}"') 
C(i)  =  3a,  c.  C2[i]  lA  c  *  A2[i\  lA  a  *  (c=0  V  a=Vi  V  0) 
=  36.  0  <  6  <  TO  *  B2[i\  lA  b  *  col[b]  i— >■  Vi 
The  formulas  6{i,u)  are  similar  to  the  formulas  a{i,u)  in  the 
invariant  that  we  used  to  verify  Treiber’s  stack.  However,  the 
single  token  0  is  replaced  by  n  tokens  0”.  The  formulas  C(i) 
are  based  on  the  same  idea  but  are  a  bit  more  complicated  since 
it  is  dynamically  decided  to  which  memory  cell  the  assumption 
of  thread  i  applies  (namely,  col\b\  contains  the  value  that  is 
stored  in  the  thread-local  variable  otherT).  The  formulas 
form  an  invariant  that  relates  the  variables  Vi  to  the  value  stored 
in  coZ[i?2[i]].  The  loop  invariant  of  the  outer  loop  is 

{pushed  V  0"'")  *  Al[Ud]  lA  _  *  Cl[tid]  lA  _ 

*  A2[tid]  _  *  B2[tid]  _  *  C2[tid]  _ 

The  loop  invariant  of  the  inner  loop  is 

{matehed  V  0")  *  A2[tid]  lA  B2[tid]  lA  _'^C2[tid]  dA  _  . 

The  proof  is  similar  to  the  proof  of  Treiber’s  stack. 
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S  :=  alloc  (1) ; 
[S]  :=  0; 

col  :=  alloc 


//  initialization 
//  elimination  array 


A1  :=  alloc (max_tid) ; 

Cl  :=  alloc (max_tid) ; 

A2  :=  alloc (max_tid) ; 

B2  :=  alloc (max_tid) ; 

C2  :=  alloc (max_tid) ; 

p-ush(v}  ^ 
pushed  :=  false; 

//  .  .  . 

while  (  ! pushed  )  do  { 
atomic  { 
t  :=  [S]; 

Cl [tid]  :=  1; 

A1 [tid]  :=  t 
}; 

//  .  .  . 

atomic  { 
s  :=  [S];  if  s  ==  t  then 
[S]  :=  x; 

pushed  :=  true; 

}  else  skip; 

Cl [tid]  :=  0 


//  auxiliary  arrays 
//  initialized  to  0 


//  expect  t  =  [S] 

//  critical  state  1  starts 


//  pushed  :=  CAS (S, t, x) 

{ 


//  critical  state  1  ends 


if  ! pushed  then  {  //  elimination  scheme 

//  ... 

atomic  { 

pos  =  GetPosition {...); 

B2 [tid]  :=  pos; 


matched  :=  false; 
while  (  ! matched  )  do  { 
atomic  { 

other!  :=  col [pos];  //  expectation 

C2 [tid]  :=  1;  //  critical  state  2  starts 

A2 [tid]  : =  other! 


//  ... 

atomic  { 

//  pushed  :=  CAS  (col+pos,  otherT,  tid) 

c  :=  col [pos];  if  c  ==  other!  then  { 
col [pos ]  : =  tid; 
matched  :=  true; 

}  else  skip; 

C2 [tid]  :=  0  //  critical  state  2  ends 


//  ...  }  } 


Fig.  11.  The  loop-structure  of  the  push  operation  of  Hendler  et  al’s  stack 
with  elimination  backoff  scheme  [9]. 
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Abstract.  Implementations  of  concurrent  objects  should  guarantee  lin- 
earizability  and  a  progress  property  such  as  wait-freedom,  lock-freedom, 
obstruction-freedom,  starvation-freedom,  or  deadlock-freedom.  Conven¬ 
tional  informal  or  semi-formal  definitions  of  these  progress  properties 
describe  conditions  under  which  a  method  call  is  guaranteed  to  com¬ 
plete,  but  it  is  unclear  how  these  definitions  can  be  utilized  to  formally 
verify  system  software  in  a  layered  and  modular  way. 

In  this  paper,  we  propose  a  unified  framework  based  on  contextual  re¬ 
finements  to  show  exactly  how  progress  properties  affect  the  behaviors 
of  client  programs.  We  give  formal  operational  definitions  of  all  common 
progress  properties  and  prove  that  for  linearizable  objects,  each  progress 
property  is  equivalent  to  a  specific  type  of  contextual  refinement  that 
preserves  termination.  The  equivalence  ensures  that  verification  of  such 
a  contextual  rehnement  for  a  concurrent  object  guarantees  both  lineariz- 
ability  and  the  corresponding  progress  property.  Contextual  refinement 
also  enables  us  to  verify  safety  and  liveness  properties  of  client  programs 
at  a  high  abstraction  level  by  soundly  replacing  concrete  method  imple¬ 
mentations  with  abstract  atomic  operations. 


1  Introduction 

A  concurrent  object  consists  of  shared  data  and  a  set  of  methods  that  provide 
an  interface  for  client  threads  to  manipulate  and  access  the  shared  data.  The 
synchronization  of  simultaneous  data  access  within  the  object  affects  the  progress 
of  the  execution  of  the  client  threads  in  the  system. 

Various  progress  properties  have  been  proposed  for  concurrent  objects.  The 
most  important  ones  are  wait-freedom,  lock-freedom  and  obstruction-freedom  for 
non-blocking  implementations,  and  starvation-freedom  and  deadlock-freedom  for 
lock-based  implementations.  These  properties  describe  conditions  under  which 
method  calls  are  guaranteed  to  successfully  complete  in  an  execution.  For  exam¬ 
ple,  lock-freedom  guarantees  that  “infinitely  often  some  method  call  finishes  in 
a  finite  number  of  steps”  [9] . 

Nevertheless,  the  common  informal  or  semi-formal  definitions  of  the  progress 
properties  are  difficult  to  use  in  a  modular  and  layered  program  verification  be¬ 
cause  they  fail  to  describe  how  the  progress  properties  affect  clients.  In  a  modular 
verification  of  client  threads,  the  concrete  implementation  7T  of  the  object  meth¬ 
ods  should  be  replaced  by  an  abstraction  (or  specification)  11  a  that  consists  of 
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equivalent  atomic  methods.  The  progress  properties  should  then  characterize 
whether  and  how  the  behaviors  of  a  client  program  will  be  affected  if  a  client 
uses  n  instead  of  11  a-  In  particular,  we  are  interested  in  systematically  study¬ 
ing  whether  the  termination  of  a  client  using  the  abstract  methods  11  a  will  be 
preserved  when  using  an  implementation  7T  with  some  progress  guarantee. 

Previous  work  on  verifying  the  safety  of  concurrent  objects  {e.g.,  [4, 12])  has 
shown  that  linearizability — a  standard  safety  criterion  for  concurrent  objects — 
and  contextual  refinement  are  equivalent.  Informally,  an  implementation  U  is 
a  contextual  rehnement  of  a  (more  abstract)  implementation  Ua,  if  every  ob¬ 
servable  behavior  of  any  client  program  using  7T  can  also  be  observed  when  the 
client  uses  11  a  instead.  To  obtain  equivalence  to  linearizability,  the  observable 
behaviors  include  I/O  events  but  not  divergence  (i.e.,  non-termination).  Re¬ 
cently,  Gotsman  and  Yang  [6]  showed  that  a  client  program  that  diverges  using 
a  linearizable  and  lock-free  object  must  also  diverge  when  using  the  abstract 
operations  instead.  Their  work  reveals  a  connection  between  lock-freedom  and 
a  form  of  contextual  refinement  which  preserves  termination  as  well  as  safety 
properties.  It  is  unclear  how  other  progress  guarantees  affect  termination  of 
client  programs  and  how  they  are  related  to  contextual  refinements. 

This  paper  studies  all  five  commonly  used  progress  properties  and  their  re¬ 
lationships  to  contextual  refinements.  We  propose  a  unified  framework  in  which 
a  certain  type  of  termination-sensitive  contextual  refinement  is  equivalent  to 
linearizability  together  with  one  of  the  progress  properties.  The  idea  is  to  iden¬ 
tify  different  observable  behaviors  for  different  progress  properties.  For  example, 
for  the  contextual  refinement  for  lock-freedom  we  observe  the  divergence  of  the 
whole  program,  while  for  wait-freedom  we  also  need  to  observe  which  threads  in 
the  program  diverge.  For  lock-based  progress  properties,  e.g.,  starvation-freedom 
and  deadlock-freedom,  we  have  to  take  fair  schedulers  into  account. 

Our  paper  makes  the  following  new  contributions: 

—  We  formalize  the  definitions  of  the  five  most  common  progress  properties: 
wait-freedom,  lock-freedom,  obstruction-freedom,  starvation-freedom,  and 
deadlock-freedom.  Our  formulation  is  based  on  possibly  infinite  event  traces 
that  are  operationally  generated  by  any  client  using  the  object. 

—  Based  on  our  formalization,  we  prove  relationships  between  the  progress 
properties.  For  example,  wait-freedom  implies  lock-freedom  and  starvation- 
freedom  implies  deadlock-freedom.  These  relationships  form  a  lattice  shown 
in  Figure  1  (where  the  arrows  represent  implications).  We  close  the  lattice 
with  a  bottom  element  that  we  call  sequential  termination,  a  progress  prop¬ 
erty  in  the  sequential  setting.  It  is  weaker  than  any  other  progress  property. 

—  We  develop  a  unified  framework  to  characterize  progress  properties  via  con¬ 
textual  refinements.  With  linearizability,  each  progress  property  is  proved 
equivalent  to  a  contextual  refinement  which  takes  into  account  divergence  of 
programs.  A  companion  TR  [14]  contains  the  formal  proofs  of  our  results. 

By  extending  earlier  equivalence  results  on  linearizability  [4],  our  contextual 
refinement  framework  can  serve  as  a  new  alternative  definition  for  the  full  cor¬ 
rectness  properties  of  concurrent  objects.  The  contextual  refinement  implied  by 
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Wait-freedom 


Lock- freedom  Starvation-freedom 


Obstruction-freedom  Deadlock-freedom 


Sequential  termination 

Fig.  1:  Relationships  between  Progress  Properties 


linearizability  and  a  progress  guarantee  precisely  characterizes  the  properties  at 
the  abstract  level  that  are  preserved  by  the  object  implementation.  When  prov¬ 
ing  these  properties  of  a  client  of  the  object,  we  can  soundly  replace  the  concrete 
method  implementations  by  its  abstract  operations.  On  the  other  hand,  since  the 
contextual  refinement  also  implies  linearizability  and  the  progress  property,  we 
can  potentially  borrow  ideas  from  existing  proof  methods  for  contextual  refine¬ 
ments,  such  as  simulations  {e.g.,  [13])  and  logical  relations  {e.g.,  [2]),  to  verify 
linearizability  and  the  progress  guarantee  together. 

In  the  remainder  of  this  paper,  we  first  informally  explain  our  framework 
in  Section  2.  We  then  introduce  the  formal  setting  in  Section  3;  including  the 
definition  of  linearizability  as  the  safety  criterion  of  objects.  We  formulate  the 
progress  properties  in  Section  4  and  the  contextual  refinement  framework  in 
Section  5.  We  discuss  related  work  and  conclude  in  Section  6. 

2  Informal  Account 

In  this  section,  we  informally  describe  our  results.  We  first  give  an  overview  of 
linearizability  and  its  equivalence  to  the  basic  contextual  refinement.  Then  we 
explain  the  progress  properties  and  summarize  our  new  equivalence  results. 


Linearizability  and  Contextual  Refinement.  Linearizability  is  a  standard 
safety  criterion  for  concurrent  objects  [9].  Intuitively,  linearizability  describes 
atomic  behaviors  of  object  implementations.  It  requires  that  each  method  call 
should  appear  to  take  effect  instantaneously  at  some  moment  between  its  invo¬ 
cation  and  return. 

Linearizability  intuitively  establishes  a  correspondence  between  the  object 
implementation  77  and  the  intended  atomic  operations  11  a  ■  This  correspondence 
can  also  be  understood  as  a  contextual  refinement.  Informally,  we  say  that  77  is  a 
contextual  refinement  of  11  a,  77  C  IJa,  if  substituting  77  for  11  a  in  any  context 
(i.e.,  in  a  client  program)  does  not  add  observable  behaviors.  External  observers 
cannot  tell  that  11  a  has  been  replaced  by  77  from  monitoring  the  behaviors  of 
the  client  program. 

It  has  been  proved  [4, 12]  that  linearizability  is  equivalent  to  a  contextual 
refinement  in  which  the  observable  behaviors  are  finite  traces  of  I/O  events.  Thus 
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this  basic  contextual  refinement  can  be  used  to  distinguish  linearizable  objects 
from  non-linearizable  ones.  But  it  cannot  characterize  progress  properties  of 
objects. 

Progress  Properties.  Figure  2  shows  several  implementations  of  a  counter 
with  different  progress  guarantees  that  we  study  in  this  paper.  A  counter  object 
provides  the  two  methods  inc  and  dec  for  incrementing  and  decrementing  a 
shared  variable  x.  The  implementations  given  here  are  not  intended  to  be  prac¬ 
tical  but  merely  to  demonstrate  the  meanings  of  the  progress  properties.  We 
assume  that  every  command  is  executed  atomically. 

Informally,  an  object  implementation  is  wait-free,  if  it  guarantees  that  every 
thread  can  complete  any  started  operation  of  the  data  structure  in  a  finite  num¬ 
ber  of  steps  [7].  Figure  2(a)  shows  an  ideal  wait-free  implementation  in  which  the 
increment  and  the  decrement  are  done  atomically.  This  implementation  is  obvi¬ 
ously  wait-free  since  it  guarantees  termination  of  every  method  call  regardless  of 
interference  from  other  threads.  Note  that  realistic  implementations  of  wait-free 
counters  are  more  complex  and  involve  arrays  and  atomic  snapshots  [1]. 

Loek-freedom  is  similar  to  wait-freedom  but  only  guarantees  that  some  thread 
will  complete  an  operation  in  a  finite  number  of  steps  [7] .  Typical  lock-free  imple¬ 
mentations  (such  as  the  well-known  Treiber  stack,  HSY  elimination-backoff  stack 
and  Harris-Michael  lock-free  list)  use  the  atomic  compare-and-swap  instruction 
cas  in  a  loop  to  repeatedly  attempt  an  update  until  it  succeeds.  Figure  2(b) 
shows  such  an  implementation  of  the  counter  object.  It  is  lock-free,  because 
whenever  inc  and  dec  operations  are  executed  concurrently,  there  always  exists 
some  successful  update.  Note  that  this  object  is  not  wait-free.  For  the  following 
program  (2.1),  the  cas  instruction  in  the  method  called  by  the  left  thread  may 
continuously  fail  due  to  the  continuous  updates  of  x  made  by  the  right  thread. 

inc();  II  while  (true)  inc();  (2.1) 

Herlihy  et  al.  [8]  propose  obstruction-freedom  which  “guarantees  progress 
for  any  thread  that  eventually  executes  in  isolation”  {i.e.,  without  other  active 
threads  in  the  system).  They  present  two  double-ended  queues  as  examples.  In 
Figure  2(c)  we  show  an  obstruction- free  counter  that  may  look  contrived  but 
nevertheless  illustrates  the  idea  of  the  progress  property. 

The  implementation  introduces  a  variable  i,  and  lets  inc  perform  the  atomic 
increment  after  increasing  i  to  10  and  dec  do  the  atomic  decrement  after  decreas¬ 
ing  i  to  0.  Whenever  a  method  is  executed  in  isolation  {i.e.,  without  interference 
from  other  threads),  it  will  complete.  Thus  the  object  is  obstruction- free.  It  is 
not  lock-free,  because  for  the  client 

inc()  ;  II  decO  ;  (2.2) 

which  executes  an  increment  and  a  decrement  concurrently,  it  is  possible  that 
neither  of  the  method  calls  returns.  For  instance,  under  a  specific  schedule,  every 
increment  over  i  made  by  the  left  thread  is  immediately  followed  by  a  decrement 
from  the  right  thread. 
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1  incO  ■[  X  :=  X  +  1;  } 

2  decO  ■[  X  :=  X  -  1;  } 

(a)  Wait-Free  (Ideal)  Impl. 

1  incO  { 

2  local  t,  b; 

3  do  { 

4  t  :=  x; 

5  b  :=  cas (&x,t ,t+l) ; 

6  }  while ( !b) ; 

7  > 

(b)  Lock-Free  Impl. 


1  incO  { 

2  while  (i  <  10)  { 

3  i  :=  i  +  1; 

4  } 

5  X  :  =  X  +  1 ; 

6  } 

7  decO  { 

8  while  (i  >  0)  { 

9  i  :=  i  -  1; 

10  } 

11  X  :=  X  -  1; 

12  } 

(c)  Obstruction-Free  Impl. 


1  incO  { 

2  TestAndSet_lockO  ; 

3  X  :  =  X  +  1 ; 

4  TestAndSet_unlockO  ; 

5  }■ 

(d)  Deadlock-Free  Impl. 

1  incO  i 

2  Bakery_lockO  ; 

3  X  :  =  X  +  1 ; 

4  Bakery_unlockO  ; 

5  y 

(e)  Starvation-Free  Impl. 


Fig.  2  :  Counter  Objects  with  Methods  inc  and  dec 


Wait-freedom,  lock-freedom,  and  obstruction-freedom  are  progress  properties 
for  non-blocking  implementations,  where  a  delay  of  a  thread  cannot  prevent  other 
threads  from  making  progress.  In  contrast,  deadlock-freedom  and  starvation- 
freedom  are  progress  properties  for  lock-based  implementations.  A  delay  of  a 
thread  holding  a  lock  will  block  other  threads  which  request  the  lock. 

Deadlock- freedom  and  starvation-freedom  are  often  dehned  in  terms  of  locks 
and  critical  sections.  Deadlock-freedom  guarantees  that  some  thread  will  succeed 
in  acquiring  the  lock,  and  starvation-freedom  states  that  every  thread  attempting 
to  acquire  the  lock  will  eventually  succeed  [9].  For  example,  a  test-and-set  spin 
lock  is  deadlock-free  but  not  starvation-free.  In  a  concurrent  access,  some  thread 
will  successfully  set  the  bit  and  get  the  lock,  but  there  might  be  a  thread  that 
is  continuously  failing  to  get  the  lock.  Lamport’s  bakery  lock  is  starvation-free. 
It  ensures  that  threads  can  acquire  locks  in  the  order  of  their  requests. 

However,  as  noted  by  Herlihy  and  Shavit  [10],  the  above  definitions  based  on 
locks  are  unsatisfactory,  because  it  is  often  difficult  to  identify  a  particular  held 
in  the  object  as  a  lock.  Instead,  they  suggest  dehning  them  in  terms  of  method 
calls.  They  also  notice  that  the  above  dehnitions  implicitly  assume  that  every 
thread  acquiring  the  lock  will  eventually  release  it.  This  assumption  requires  fair 
scheduling,  z.e.,  every  thread  gets  eventually  executed. 

Following  Herlihy  and  Shavit  [10],  we  say  an  object  is  deadlock-free,  if  in 
each  fair  execution  there  always  exists  some  method  call  that  can  hnish.  As 
an  example  in  Figure  2(d),  we  use  a  test-and-set  lock  to  synchronize  the  incre¬ 
ments  of  the  counter.  Since  some  thread  is  guaranteed  to  acquire  the  test-and-set 
lock,  the  method  call  of  that  thread  is  guaranteed  to  finish.  Thus  the  object  is 
deadlock-free.  Similarly,  a  starvation-free  object  guarantees  that  every  method 
call  can  finish  in  fair  executions.  Figure  2(e)  shows  a  counter  implemented  with 
Lamport’s  bakery  lock.  It  is  starvation- free  since  the  bakery  lock  ensures  that 
every  thread  can  acquire  the  lock  and  hence  every  method  call  can  eventually 
complete. 
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Wait-Free 

Lock-Free 

Obstruction-Free 

Deadlock-Free 

Starvation-Free 

77a 

(t,  Div.) 

Div. 

Div. 

Div. 

(t,  Div.) 

77 

(t,  Div.) 

Div. 

Div.  if  Isolating 

Div.  if  Fair 

(t,  Div.)  if  Fair 

Table  1:  Characterizing  Progress  Properties  via  Contextual  Refinements  77  C  Ua 


Our  Results.  None  of  the  above  definitions  of  the  five  progress  properties 
describes  their  guarantees  regarding  the  behaviors  of  client  code.  In  this  paper, 
we  define  several  contextual  refinements  to  characterize  the  effects  over  client 
behaviors  when  the  client  uses  objects  with  some  progress  properties.  We  show 
that  linearizability  together  with  a  progress  property  is  equivalent  to  a  certain 
termination-sensitive  contextual  refinement.  Table  1  summarizes  our  results. 

For  each  progress  property,  the  new  contextual  refinement  U  C  Ua  is  de¬ 
fined  with  respect  to  a  divergence  behavior  and/or  a  specific  scheduling  at  the 
implementation  level  (the  third  row  in  Table  1)  and  at  the  abstract  side  (the 
second  row),  in  addition  to  the  I/O  events  in  the  basic  contextual  refinement  for 
linearizability. 

—  For  wait-freedom,  we  need  to  observe  the  divergence  of  each  individual  thread 
t,  represented  by  “(t,  Div.)”  in  Table  1,  at  both  the  concrete  and  the  abstract 
levels.  We  show  that,  if  the  thread  t  of  a  client  program  diverges  when  the 
client  uses  a  linearizable  and  wait-free  object  7T,  then  thread  t  must  also 
diverge  when  using  Ua  instead. 

—  The  case  for  lock-freedom  is  similar,  except  that  we  now  consider  the  diver¬ 
gence  behaviors  of  the  whole  client  program  rather  than  individual  threads 
(denoted  by  “Div.”  in  Table  1).  If  a  client  diverges  when  using  a  linearizable 
and  lock-free  object  7T,  it  must  also  diverge  when  it  uses  11  a  instead. 

—  For  obstruction-freedom,  we  consider  the  behaviors  of  isolating  executions 
at  the  concrete  side  (denoted  by  “Div.  if  Isolating”  in  Table  1).  In  those 
executions,  eventually  only  one  thread  is  running.  We  show  that,  if  a  client 
diverges  in  an  isolating  execution  when  it  uses  a  linearizable  and  obstruction- 
free  object  n ,  it  must  also  diverge  in  some  abstract  execution. 

—  For  deadlock-freedom,  we  only  care  about  fair  executions  at  the  concrete 
level  (denoted  by  “Div.  if  Fair”  in  Table  1). 

—  For  starvation-freedom,  we  observe  the  divergence  of  each  individual  thread 
at  both  levels  and  restrict  our  considerations  to  fair  executions  for  the  con¬ 
crete  side  (“(t,  Div.)  if  Fair”  in  Table  1).  Any  thread  using  7T  can  diverge  in 
a  fair  execution,  only  if  it  also  diverges  in  some  abstract  execution. 

These  new  contextual  refinements  can  characterize  linearizable  objects  with 
progress  properties.  We  will  formalize  the  results  and  give  examples  in  Section  5. 

3  Formal  Setting  and  Linearizability 

In  this  section,  we  formalize  linearizability  and  show  its  equivalence  to  a  contex¬ 
tual  refinement  that  preserves  safety  properties  only.  This  equivalence  is  the  basis 
of  our  new  results  that  relate  progress  properties  and  contextual  refinements. 


234 


(Expr)  E 
(Stmt)  C 

(Prog)  W 
( ODect)  n 


(BExp)  B  ::=  . . .  (Instr)  c  print(_B)  |  . . . 
skip  \  c  \  X  ■.=  f{E)  I  return  E  \  end 
{C)  I  C';C'  I  if  (B)  C  else  C  \  while  {B){C} 
skip  I  let  n  in  C II ...  II  (7 
{/i  (xi,  Cl),  {x„,  C„)} 


Fig.  3:  Syntax  of  the  Programming  Language 


(State)  S 
(Evt)  e 

(ETrace)  T 


( ThrdID)  t  €  Nat 
I  (t,ret,n)  |  (t,  obj)  |  (t,  obj,  abort) 

(t,  out,  n)  I  (t,  clt)  I  (t,  clt,  abort)  |  (t,term)  |  (spawn,  n) 
£  I  e::T  (co- inductive) 

Fig.  4:  States  and  Event  Traces 


Language  and  Semantics.  We  use  a  similar  language  as  in  previous  work  of 
Liang  and  Feng  [12].  As  shown  in  Figure  3,  a  program  W  consists  of  several 
client  threads  that  run  in  parallel.  Each  thread  could  call  the  methods  declared 
in  the  object  77.  A  method  /  is  defined  as  a  pair  (x,  C),  where  x  is  the  formal 
argument  and  C  is  the  method  body.  The  object  77  could  be  either  concrete 
with  fine-grained  code  that  we  want  to  verify,  or  abstract  (usually  denoted  as 
Ua  in  the  following)  that  we  consider  as  the  specification.  For  the  latter  case, 
each  method  body  should  be  an  atomic  operation  of  the  form  (C)  and  it  should 
be  always  safe  to  execute  it.  For  simplicity,  we  assume  there  is  only  one  object 
in  the  program  W  and  each  method  takes  one  argument  only. 

Most  commands  are  standard.  Clients  can  use  print(7f)  to  produce  observ¬ 
able  external  events.  We  do  not  allow  the  object’s  methods  to  produce  external 
events.  To  simplify  the  semantics,  we  also  assume  there  are  no  nested  method 
calls.  To  discuss  progress  properties  later,  we  introduce  an  auxiliary  command 
end.  It  is  a  special  marker  that  can  be  added  at  the  end  of  a  thread,  but  is  not 
supposed  to  be  used  directly  by  programmers.  The  skip  statement  plays  two 
roles  here:  a  statement  that  has  no  computation  effects  or  a  flag  to  show  the  end 
of  an  execution. 

We  use  S  for  a  program  state.  Program  transitions  (IF,  *S)  (IF',<S')  gen¬ 

erate  events  e  defined  in  Figure  4.  A  method  invocation  event  (t,  /,  n)  is  produced 
when  thread  t  executes  x  :=  f{E),  where  n  is  the  value  of  the  argument  E.  A 
return  (t,  ret,  n)  is  produced  with  the  return  value  n.  print(7f)  generates  an  out¬ 
put  (t,out,n),  and  end  generates  a  termination  marker  (t,  term).  Other  steps 
generate  either  normal  object  actions  (t,  obj)  (for  steps  inside  method  calls)  or 
silent  client  actions  (t,clt)  (for  client  steps  other  than  print(7f)).  For  transi¬ 
tions  leading  to  the  error  state  abort  (e.g.,  invalid  memory  access),  fault  events 
are  produced:  (t,  obj,  abort)  by  the  object  method  code  and  (t,  clt,  abort)  by 
the  client  code.  We  also  introduce  an  auxiliary  event  (spawn,  n),  saying  that  n 
threads  are  spawned.  It  will  be  useful  later  when  defining  fair  scheduling  (in  Sec¬ 
tion  4).  We  write  tid(e)  for  the  thread  ID  in  the  event  e.  The  predicate  is_clt(e) 
states  that  the  event  e  is  either  a  silent  client  action,  an  output,  or  a  client 
fault.  We  write  is_inv(e)  and  is_ret(e)  to  denote  that  e  is  a  method  invocation 
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TlW,Sl  =  {T  I  3W',S'.  {W,S)  {W',S')  V  (W",5)  HA*abort} 

nW,S}  {get.hist(T)  I  TeTlW,SU 
OiW,Sl  "='  {get.obsv(T)  I  TeTlW,SU 

Fig.  5  :  Generation  of  Finite  Event  Traces 

and  a  return,  respectively.  The  predicate  is_abt(e)  denotes  a  fault  of  the  object 
or  the  client.  Method  invocations,  returns  and  object  faults  are  called  history 
events,  which  will  be  used  to  dehne  linearizability  below.  Outputs,  client  faults 
and  object  faults  are  called  observable  events. 

An  event  trace  T  is  a  hnite  or  inhnite  sequence  of  events.  We  write  T{i)  for 
the  i-th  event  of  T.  last(r)  is  the  last  event  in  a  finite  T.  The  trace  is  the 

sub-trace  T(l), . . . ,  T{i)  of  T,  and  |T|  is  the  length  of  T  (|r|  =  w  if  T  is  infinite). 
The  trace  T|t  represents  the  sub-trace  of  T  consisting  of  all  events  whose  thread 
ID  is  t.  We  can  use  get_hist(T)  to  project  T  to  the  sub-trace  consisting  of  all  the 
history  events,  and  get_obsv(T)  for  the  sub-trace  of  all  the  observable  events. 
Finite  traces  of  history  events  are  called  histories. 

In  Figure  5,  we  define  T|IF,  5]  for  the  prefix-closed  set  of  hnite  traces  pro¬ 
duced  by  the  executions  of  (IF, 5).  We  use  (IF, 5)  *  (IF', 5')  for  zero  or 

multiple-step  program  transitions  that  generate  the  trace  T.  We  also  dehne 
"^[[^,<51  and  OJIF,  <S|  to  get  histories  and  hnite  observable  traces  produced  by 
the  executions  of  (IF,  <S).  The  TR  [14]  contains  more  details  about  the  language. 


Linearizability  and  Basic  Contextual  Refinement.  We  formulate  lineariz¬ 
ability  following  its  standard  dehnition  [11].  Below  we  sketch  the  basic  concepts. 
Detailed  formal  dehnitions  can  be  found  in  the  companion  TR  [14]. 

Linearizability  is  dehned  using  histories.  We  say  a  return  62  matches  an 
invocation  ei,  denoted  as  match(ei,  62),  iff  they  have  the  same  thread  ID.  An  in¬ 
vocation  is  pending  in  T  if  no  matching  return  follows  it.  We  can  use  pend_inv(T) 
to  get  the  set  of  pending  invocations  in  T.  We  handle  pending  invocations  in 
a  history  T  in  the  standard  way  [11]:  we  append  zero  or  more  return  events 
to  T,  and  drop  the  remaining  pending  invocations.  The  result  is  denoted  by 
completions(T).  It  is  a  set  of  histories,  and  for  each  history  in  it,  every  invoca¬ 
tion  has  a  matching  return  event. 

Definition  1  (Linearizable  Histories).  T  r  iff 

1.  Vt.  Tit  =  T'lt; 

2.  there  exists  a  bijection  tt  :  {1, . . . ,  |r|}  — >■  {1, . . . ,  |T'|}  such  that  Vi.  T{i)  = 
T'(7r(i))  and  Vi,j.  i  <  j  A  is_ret(r(f))  A  isJnv(T(j))  7r(i)  <  7r(j). 

That  is,  T  is  linearizable  w.r.t.  T'  if  the  latter  is  a  permutation  of  the  former, 
preserving  the  order  of  events  in  the  same  threads  and  the  order  of  the  non¬ 
overlapping  method  calls.  Then  an  object  is  linearizable  iff  each  of  its  concurrent 
histories  after  completions  is  linearizable  w.r.t.  some  legal  sequential  history.  We 
use  IIa  \>  (Sa,  T')  to  mean  that  T'  is  a  legal  sequential  history  generated  by  any 
client  using  the  specification  IIa  with  an  abstract  initial  state  Sa. 
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Definition  2  (Linearizability  of  Objects).  The  object’s  implementation  11 
is  linearizable  w.r.t.IlA  under  a  refinement  mapping  (p,  denoted  by  11  iff 

Vn,Ci,...,C'n,5,5a,T.  T  €  ?t[(let  J7  in  Ci  || . . .  ||  Cn),  51  A  (v9(5)  =  5.) 

=>  3Tc,  T' .  Tc  G  completions(T)  A  11  a  I>  (5a,  T')  A  Tc  T'  . 

Here  the  partial  mapping  if.  Stated  State  relates  concrete  states  to  abstract  ones. 

The  side  condition  ip{S)  =  Sa  in  the  above  definition  requires  the  initial  concrete 
state  S  to  be  well- formed  in  that  it  represents  a  valid  abstract  state  Sa.  For 
instance,  ip  may  need  S  to  contain  a  linked  list  and  relate  it  to  an  abstract 
mathematical  set  in  Sa  for  a  set  object.  Besides,  ip  should  always  require  the 
client  states  in  S  and  Sa  to  be  identical. 

Next  we  define  a  contextual  refinement  between  the  concrete  object  and  its 
specification,  which  is  equivalent  to  linearizability. 

Definition  3  (Basic  Contextual  Refinement).  H  Ha  iff 

Vn,Ci,...,C„,5,5a.  ((^(5)  =5a) 

C>|(let  n  in  Cl  ||...||C7„),51  C  C>I(let  Ha  in  Ci  || . . .  ||  C„),  5a]  . 

Remember  that  OJlT,  5|  represents  the  prefix-closed  set  of  observable  event 
traces  generated  during  the  executions  of  (IF, 5),  which  is  defined  in  Figure  5. 

Following  Filipovic  et  al.  [4],  we  can  prove  that  linearizability  is  equivalent 
to  this  contextual  refinement.  We  give  the  proofs  in  the  TR  [14]. 

Theorem  1  (Basic  Equivalence).  H  Ha  H  Ha- 

Theorem  1  allows  us  to  use  H  Ha  to  identify  linearizable  objects.  However, 
we  cannot  use  it  to  characterize  progress  properties  of  objects.  For  the  following 
example,  H  Ha  holds  although  no  concrete  method  call  of  f  could  finish  (we 
assume  this  object  contains  a  method  f  only). 

77(1) :  while(true)  skip;  77^(1) :  skip;  C  :  print(l);  f();  print(l); 

The  reason  is  that  H  Ha  considers  a  prefix-closed  set  of  event  traces  at  the 
abstract  side.  For  the  above  client  C,  the  observable  behaviors  of  let  77  in  C 
can  all  be  found  in  the  prefix-closed  set  of  behaviors  produced  by  let  Ha  in  C. 


4  Formalizing  Progress  Properties 

We  define  progress  in  Figure  6  as  properties  over  both  event  traces  T  and  object 
implementations  77.  We  say  an  object  implementation  77  has  a  progress  property 
P  iff  all  its  event  traces  have  the  property.  Here  we  use  Tuj  to  generate  the  event 
traces.  Its  definition  in  Figure  6  is  similar  to  T|bF,  5]  of  Figure  5,  but  7I;|bF,  5] 
is  for  the  set  of  finite  or  infinite  event  traces  produced  by  complete  executions. 
We  use  (W,  5)  i — •  to  denote  the  existence  of  a  T-labelled  infinite  execution. 
(IF,  5)  I — >■  *  (skip,  _)  represents  a  terminating  execution  that  produces  T.  By 
using  \W\,  we  append  end  at  the  end  of  each  thread  to  explicitly  mark  the 
termination  of  the  thread.  We  also  insert  the  spawning  event  (spawn,  n)  at  the 
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Definition.  An  object  U  satisfies  P  under  a  refinement  mapping  ip,  P,p{n),  iff 
yn,Ci,...,Cn,S,T.T  eTu^liletnin  Ci||. . -HCn),  51  A  (5€  dom(v9))  ^  P(T) . 

TAW,S}  =  {(spawn,  |VK|)::T  | 

(LVFJ,5)^“-  V(LW"J,5)^*(skip,.)  V  (LVkJ,5)^*abort} 
[iet  i7  in  Cl  II ...  II  CnJ  let  U  in  (Ci;  end)  || . . .  ||  (C„;  end) 

|let  71  in  Cl  II  ...  II  C„|  n  tnum((spawn,  n)  ::T)  n 

pendJnv(r)  {e  |  3*.  e  =  T{i)  A  isjnv(e)  A  -’3j.  {j>iA  match(e,  r(ji)))} 
prog-t(T)  iff  Vi,  e.  e  €  pendJnv(T(l..i))  =►  3j.  j  >  i  A  match(e,  r(ji)) 
prog-s(r)  iff  Vi,  e.  e  €  pendJnv(T(l..i))  =>  3j.  j  >  i  A  is_ret(r(j)) 
abt(T)  iff  3i.  is_abt(r(i)) 

sched(r)  iff  |T|  =  oj  A  pend Jnv(T)  /  0  =>  3e.  e  €  pendJnv(T)  A  |(r|t]ci(e))|  =  ic 
fair(T)  iff  |r|  =  ca  =>  Vt  G  [l..tnum(T)].  |(r|t)|  =  a;  V  last(r|t)  =  (t,  term) 
iso(T)  iff  |r|=ca  =>  3t,  i.  (Vj.  j  >  i  =>  tid(T(j))  =  t) 

>  prog-t  V  abt 
prog-s  V  abt 


wait-free  iff  sched  prog-t  V  abt  starvation-free  iff  fair 

lock-free  iff  sched  prog-s  V  abt  deadlock-free  iff  fair  = 

obstruction-free  iff  sched  A  iso  prog-t  V  abt 

Fig.  6:  Formalizing  Progress  Properties 


lock-free  wait-free  V  prog-s  starvation-free  wait-free  V  -ifair 

obstruction-free  <;=>  lock-free  V  -liso  deadlock-free  lock-free  V  -ifair 

Fig.  7:  Relationships  between  Progress  Properties 

beginning  of  T,  where  n  is  the  number  of  threads  in  W.  Then  we  can  use  tnum(T) 
to  get  the  number  n,  which  is  needed  to  define  fairness,  as  shown  below. 

Before  formulating  each  progress  property  over  event  traces,  we  first  define 
some  auxiliary  properties  in  Figure  6.  prog-t(T)  guarantees  that  every  method 
call  in  T  eventually  finishes.  prog-s(T)  guarantees  that  some  pending  method 
call  finishes.  Different  from  prog-t,  the  return  event  T{j)  in  prog-s  does  not  have 
to  be  a  matching  return  of  the  pending  invocation  e.  abt(T)  says  that  T  ends 
with  a  fault  event. 

There  are  three  useful  conditions  on  scheduling.  The  basic  requirement  for 
a  good  schedule  is  sched.  If  T  is  infinite  and  there  exist  pending  calls,  then  at 
least  one  pending  thread  should  be  scheduled  infinitely  often.  In  fact,  there  are 
two  possible  reasons  causing  a  method  call  of  thread  t  to  pend.  Either  t  is  no 
longer  scheduled,  or  it  is  always  scheduled  but  the  method  call  never  finishes, 
sched  rules  out  the  bad  schedule  where  no  thread  with  an  invoked  method  is 
active.  For  instance,  the  following  infinite  trace  does  not  satisfy  sched. 


(ti,  /i,  m) ::  (t2,  /2,  n2) ::  (ti,  obj) ::  (ts,  clt) ::  (ts,  clt) ::  (ts,  clt) :: 
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div.tids(r)  "=*  {t|  (|(r|OI  =  c.)} 

0^lW,Sj  =  {get.obsv(T)  I  Te%,lW,S]} 

0,^lW,S}  =  {get.obsv(r)  I  T  eTolW,SjA\so{T)} 

Of^lW,S]  =  {get.obsv(T)  I  T€  7L[W",51  Afair(T)} 

Otu,lW,S\  =  {(get.obsv(r),div.tids(r))  I  TeTu,lW,S\} 

Oftu.lW,S]  =  {(get.obsv(T),div.tids(T))  |  T  G  A  fair(r)} 

Fig.  8:  Generation  of  Complete  Event  Traces 

If  T  is  infinite,  fair(T)  requires  every  non-terminating  thread  be  scheduled  in¬ 
finitely  often;  and  iso(T)  requires  eventually  only  one  thread  be  scheduled.  We 
can  see  that  a  fair  schedule  is  a  good  schedule  satisfying  sched. 

At  the  bottom  of  Figure  6  we  define  the  progress  properties  formally.  We 
omit  the  parameter  T  in  the  formulae  to  simplify  the  presentation.  An  event 
trace  T  is  wait-free  (f.e.,  wait-free(T)  holds)  if  under  the  good  schedule  sched,  it 
guarantees  prog-t  unless  it  ends  with  a  fault,  lock-free(r)  is  similar  except  that 
it  guarantees  prog-s.  Starvation-freedom  and  deadlock-freedom  guarantee  prog-t 
and  prog-s  under  fair  scheduling.  Obstruction- freedom  guarantees  prog-t  if  some 
pending  thread  is  always  scheduled  (sched)  and  runs  in  isolation  (iso). 

Figure  7  contains  lemmas  that  relate  progress  properties.  For  instance,  an 
event  trace  is  starvation-free,  iff  it  is  wait-free  or  not  fair.  These  lemmas  give  us 
the  relationship  lattice  in  Figure  1.  To  close  the  lattice,  we  also  define  a  progress 
property  in  the  sequential  setting.  Sequential  termination  guarantees  that  every 
method  call  must  finish  in  a  trace  produced  by  a  sequential  client.  The  formal 
definition  is  given  in  the  companion  TR  [14],  and  we  prove  that  it  is  implied  by 
each  of  the  five  progress  properties  for  concurrent  objects. 

5  Equivalence  to  Contextual  Refinements 

We  extend  the  basic  contextual  rehnement  in  Dehnition  3  to  observe  progress 
as  well  as  linearizability.  For  each  progress  property,  we  carefully  choose  the 
observable  behaviors  at  the  concrete  and  the  abstract  levels. 


5.1  Observable  Behaviors 

In  Figure  8,  we  define  various  observable  behaviors  for  the  termination-sensitive 
contextual  refinements. 

We  use  OcjIIF,  5]  to  represent  the  set  of  observable  event  traces  produced 
by  complete  executions  of  (IF,  5).  Recall  that  get_obsv(T)  gets  the  sub-trace 
of  T  consisting  of  all  the  observable  events  only.  Unlike  the  prehx-closed  set 
OlIF,  5],  this  dehnition  utilizes  7I;|1F,  <S]|  (see  Figure  6)  whose  event  traces  are 
all  complete  and  could  be  inhnite.  Thus  it  allows  us  to  observe  divergence  of  the 
whole  program.  and  take  the  complete  observable  traces  of  isolating 
and  fair  executions  respectively.  Here  iso(T)  and  fair(T)  are  dehned  in  Figure  6. 
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p 

wait-free 

lock-free 

obstruction-free 

deadlock-free 

starvation-free 

n  Ua 

Otu  C  Otu, 

Ou,  C  Ou, 

Oiu,  c  Ou, 

Ofcj  c  Ouj 

Oftu,  c  Ot.. 

Table  2:  Contextual  Refinements  11  IZ^  Ua  for  Progress  Properties  P 

We  could  also  observe  divergence  of  individual  threads  rather  than  the  whole 
program.  We  define  div_tids(T)  to  collect  the  set  of  threads  that  diverge  in  the 
trace  T.  Then  we  write  Ot^fW,  <S]|  to  get  both  the  observable  behaviors  and  the 
diverging  threads  in  the  complete  executions.  5]  is  defined  similarly  but 

considers  fair  executions  only. 

More  on  divergence.  In  general,  divergence  means  non-termination.  For  example, 
we  could  say  that  the  following  two-threaded  program  (5.1)  must  diverge  since 
it  never  terminates. 


X  :=  X  +  1;  II  while  (true)  skip;  (5.1) 

But  for  individual  threads,  divergence  is  not  equivalent  to  non-termination,  since 
a  non-terminating  thread  may  either  have  an  infinite  execution  or  simply  be  not 
scheduled  from  some  point  due  to  unfair  scheduling.  We  view  only  the  former 
case  as  divergence.  For  instance,  in  an  unfair  execution,  the  left  thread  of  (5.1) 
may  never  be  scheduled  and  hence  it  has  no  chance  to  terminate.  It  does  not 
diverge.  Similarly,  for  the  following  program  (5.2), 

while  (true)  skip;  ||  while  (true)  skip;  (5.2) 

the  whole  program  must  diverge,  but  it  is  possible  that  a  single  thread  does  not 
diverge  in  an  execution. 

5.2  New  Contextual  Refinements  and  Equivalence  Results 

In  Table  2,  we  summarize  the  definitions  of  the  termination-sensitive  contextual 
refinements.  Each  new  contextual  refinement  follows  the  basic  one  in  Definition  3 
but  takes  different  observable  behaviors  as  specified  in  Table  2.  For  example,  the 
contextual  refinement  for  wait-freedom  is  formally  defined  as  follows: 

n  Ha  iff  (Vn.Ci, . .  .,Cr.,S,Sa.  {g>{S)  =  Sa) 

Otu.metnin  Ci||...||C'„),5)l  C(!)t,,[(letJ7Ain  Ci  || . . .  ||  C„),  ). 

Theorem  2  says  that  linearizability  with  a  progress  property  P  together  is  equiv¬ 
alent  to  the  corresponding  contextual  refinement 

Theorem  2  (Equivalence).  77  11  a  A  P^{n)  77  77a  ,  where  P  is 

wait-free,  lock-free,  obstruction-free,  deadlock-free  or  starvation-free. 

Here  we  assume  the  object  specification  11  a  is  total,  i.e.,  the  abstract  operations 
never  block.  We  provide  the  proofs  of  our  equivalence  results  in  the  TR  [14]. 

The  contextual  refinement  for  wait-freedom  takes  Otu;  at  both  the  concrete 
and  the  abstract  levels.  The  divergence  of  individual  threads  as  well  as  I/O 
events  are  treated  as  observable  behaviors.  The  intuition  of  the  equivalence  is  as 
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follows.  Since  a  wait-free  object  77  guarantees  that  every  method  call  finishes, 
we  have  to  blame  the  client  code  itself  for  the  divergence  of  a  thread  using  77. 
That  is,  even  if  the  thread  uses  the  abstract  object  Ua,  it  must  still  diverge. 

As  an  example,  consider  the  client  program  (2.1).  Intuitively,  for  any  execu¬ 
tion  in  which  the  client  uses  the  abstract  operations,  only  the  right  thread  t2 
diverges.  Thus  Otuj  of  the  abstract  program  is  a  singleton  set  {(e,  {t2})}.  When 
the  client  uses  the  wait-free  object  in  Figure  2(a),  its  set  is  still  {(e,  {t2})}. 
It  does  not  produce  more  observable  behaviors.  But  if  it  uses  a  non-wait-free 
object  (such  as  the  one  in  Figure  2(b)),  the  left  thread  ti  does  not  necessarily 
finish.  The  Otu  set  becomes  {(e,  {12}),  (e,  {ti,t2})}.  It  produces  more  observable 
behaviors  than  the  abstract  client,  breaking  the  contextual  refinement.  Thanks 
to  observing  div_tids  that  collects  the  diverging  threads,  we  can  rule  out  non¬ 
wait-free  objects  which  may  cause  more  threads  to  diverge. 

77  jj^  takes  coarser  observable  behaviors.  We  observe  the  divergence 

of  the  whole  client  program  by  using  at  both  the  concrete  and  the  abstract 
levels.  Intuitively,  a  lock-free  object  77  ensures  that  some  method  call  will  finish, 
thus  the  client  using  77  diverges  only  if  there  are  an  infinite  number  of  method 
calls.  Then  it  must  also  diverge  when  using  the  abstract  object  Ua- 

For  example,  consider  the  client  (2.1).  The  whole  client  program  diverges  in 
every  execution  both  when  it  uses  the  lock-free  object  in  Figure  2(b)  and  when 
it  uses  the  abstract  one.  The  Ou:  set  of  observable  behaviors  is  {e}  at  both  levels. 
On  the  other  hand,  the  following  client  must  terminate  and  print  out  both  1  and 
2  in  every  execution.  The  Ouj  set  is  {1 :: 2 :: e,  2 ::  1  ::e}  at  both  levels. 

incO;  print(l);  ||  dec();  print(2);  (5-3) 

Instead,  if  the  client  (5.3)  uses  the  non-lock-free  object  in  Figure  2(c),  it  may 
diverge  and  nothing  is  printed  out.  The  Ou:  set  becomes  {e,  1 ::  2  ::  e,  2  ::  1 ::  e}, 
which  contains  more  behaviors  than  the  abstract  side.  Thus  77  cj^ck-free  jj^  fails. 

Obstruction-freedom  ensures  progress  for  isolating  executions  in  which  even¬ 
tually  only  one  thread  is  running.  Correspondingly,  77  cobstruction-free  jj^  restricts 
our  considerations  to  isolating  executions.  It  takes  (9^^  at  the  concrete  level  and 
at  the  abstract  level. 

To  understand  the  equivalence,  consider  the  client  (5.3)  again.  For  isolating 
executions  with  the  obstruction-free  object  in  Figure  2(c),  it  must  terminate  and 
print  out  both  1  and  2.  The  Oi^  set  at  the  concrete  level  is  {I ::  2 ::  e,  2 ::  1 ::  e},  the 
same  as  the  set  of  the  abstract  side.  Non-obstruction-free  objects  in  general 
do  not  guarantee  progress  for  some  isolating  executions.  If  the  client  uses  the 
object  in  Figure  2(d)  or  (e),  the  Oi^  set  is  {e,  1 ::  2  ::  e,  2  ::  I ::  e},  not  a  subset  of 
the  abstract  Ou:  set.  The  undesired  empty  observable  trace  is  produced  by  unfair 
executions,  where  a  thread  acquires  the  lock  and  gets  suspended  and  then  the 
other  thread  would  keep  requesting  the  lock  forever  (it  is  executed  in  isolation) . 

77  c^^^diock-free  jj ^  ^ggg  Q Concrete  side,  ruling  out  undesired  di¬ 
vergence  caused  by  unfair  scheduling.  For  the  client  (5.3)  with  the  object  in 
Figure  2(d)  or  (e),  its  set  is  same  as  the  set  Ou:  at  the  abstract  level. 

For  77  □Jarvation-free  Consider  only  fair  executions  at  the  concrete 

level  (similar  to  deadlock- freedom),  but  observe  the  divergence  of  individual 
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threads  rather  than  the  whole  program  (similar  to  wait- freedom) .  It  uses  Oftut 
at  the  concrete  side  and  Otcj  at  the  abstract  level.  For  the  client  (5.3)  with  the 
starvation-free  object  in  Figure  2(e),  no  thread  diverges  in  any  fair  execution. 
Then  the  set  Oftoj  of  observable  behaviors  is  {(1 ::  2 ::  e,  0),  (2 ::  1 ::  e,  0)},  which  is 
same  as  the  set  Otu:  at  the  abstract  level. 

Observing  threaded  divergence  allows  us  to  distinguish  starvation-free  objects 
from  deadlock-free  objects.  Consider  the  client  (2.1).  Under  fair  scheduling,  we 
know  only  the  right  thread  t2  would  diverge  when  using  the  starvation-free  ob¬ 
ject  in  Figure  2(e).  The  set  Oftuj  is  {(e,  {12})}.  It  coincides  with  the  abstract 
behaviors  Otui-  But  when  using  the  deadlock-free  object  of  Figure  2(d),  the  Oftu, 
set  becomes  {(e,  {12}),  (e,  {ti,t2})},  breaking  the  contextual  refinement. 

6  Related  Work  and  Conclusion 

There  is  a  large  body  of  work  discussing  the  five  progress  properties  and  the  con¬ 
textual  refinements  individually.  Our  work  in  contrast  studies  their  relationships, 
which  have  not  been  considered  much  before. 

Gotsman  and  Yang  [6]  propose  a  new  linearizability  definition  that  preserves 
lock-freedom,  and  suggest  a  connection  between  lock-freedom  and  a  termination- 
sensitive  contextual  refinement.  We  do  not  redefine  linearizability  here.  Instead, 
we  propose  a  unified  framework  to  systematically  relate  all  the  hve  progress 
properties  plus  linearizability  to  various  contextual  rehnements. 

Herlihy  and  Shavit  [10]  informally  discuss  all  the  five  progress  properties. 
Our  dehnitions  in  Section  4  mostly  follow  their  explanations,  but  they  are  more 
formal  and  close  the  gap  between  program  semantics  and  their  history-based 
interpretations.  We  also  notice  that  their  obstruction-freedom  is  inappropriate 
for  some  examples  (see  TR  [14]),  and  propose  a  different  dehnition  that  is  closer 
to  the  common  intuition  [9].  In  addition,  we  relate  the  progress  properties  to 
contextual  refinements,  which  consider  the  extensional  effects  on  client  behaviors. 

Fossati  et  al.  [5]  propose  a  uniform  approach  in  the  7r-calculus  to  formulate 
both  the  standard  progress  properties  and  their  observational  approximations. 
Their  technical  setting  is  completely  different  from  ours.  Also,  their  observational 
approximations  for  lock-freedom  and  wait-freedom  are  strictly  weaker  than  the 
standard  notions.  Their  deadlock-freedom  and  starvation-freedom  are  not  formu¬ 
lated,  and  there  is  no  observational  approximation  given  for  obstruction-freedom. 
In  comparison,  our  framework  relates  each  of  the  hve  progress  properties  (plus 
linearizablity)  to  an  equivalent  contextual  rehnement. 

There  are  also  formulations  of  progress  properties  based  on  temporal  logics. 
For  example,  Petrank  et  al.  [15]  formalize  the  three  non-blocking  properties  and 
Dongol  [3]  formalize  all  the  hve  progress  properties,  using  linear  temporal  logics. 
Those  formulations  make  it  easier  to  do  model  checking  {e.g.,  Petrank  et  al.  [15] 
also  build  a  tool  to  model  check  a  variant  of  lock- freedom),  while  our  contextual 
rehnement  framework  is  potentially  helpful  for  modular  Hoare-style  verihcation. 

Conclusion.  We  have  introduced  a  contextual  rehnement  framework  to  unify 
various  progress  properties.  For  linearizable  objects,  each  progress  property  is 
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equivalent  to  a  specific  termination-sensitive  contextual  refinement,  as  summa¬ 
rized  in  Table  1.  The  framework  allows  us  to  verify  safety  and  liveness  properties 
of  client  programs  at  a  high  abstraction  level  by  replacing  concrete  method  im¬ 
plementations  with  abstract  operations.  It  also  makes  it  possible  to  borrow  ideas 
from  existing  proof  methods  for  contextual  refinements  to  verify  linearizability 
and  a  progress  property  together,  which  we  leave  as  future  work. 
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Abstract.  Implementations  of  concurrent  objects  should  guarantee  lin- 
earizability  and  a  progress  property  such  as  wait-freedom,  lock-freedom, 
obstruction-freedom,  starvation-freedom,  or  deadlock-freedom.  Conven¬ 
tional  informal  or  semi-formal  definitions  of  these  progress  properties 
describe  conditions  under  which  a  method  call  is  guaranteed  to  com¬ 
plete,  but  it  is  unclear  how  these  dehnitions  can  be  utilized  to  formally 
verify  system  software  in  a  layered  and  modular  way. 

In  this  paper,  we  propose  a  unified  framework  based  on  contextual  re¬ 
finements  to  show  exactly  how  progress  properties  affect  the  behaviors 
of  client  programs.  We  give  formal  operational  definitions  of  all  common 
progress  properties  and  prove  that  for  linearizable  objects,  each  progress 
property  is  equivalent  to  a  specific  type  of  contextual  refinement  that 
preserves  termination.  The  equivalence  ensures  that  verification  of  such 
a  contextual  rehnement  for  a  concurrent  object  guarantees  both  lineariz- 
ability  and  the  corresponding  progress  property.  Contextual  refinement 
also  enables  us  to  verify  safety  and  liveness  properties  of  client  programs 
at  a  high  abstraction  level  by  soundly  replacing  concrete  method  imple¬ 
mentations  with  abstract  atomic  operations. 


1  Introduction 

A  concurrent  object  consists  of  shared  data  and  a  set  of  methods  that  provide 
an  interface  for  client  threads  to  manipulate  and  access  the  shared  data.  The 
synchronization  of  simultaneous  data  access  within  the  object  affects  the  progress 
of  the  execution  of  the  client  threads  in  the  system. 

Various  progress  properties  have  been  proposed  for  concurrent  objects.  The 
most  important  ones  are  wait-freedom,  lock-freedom  and  obstruction-freedom  for 
non-blocking  implementations,  and  starvation-freedom  and  deadlock-freedom  for 
lock-based  implementations.  These  properties  describe  conditions  under  which 
method  calls  are  guaranteed  to  successfully  complete  in  an  execution.  For  exam¬ 
ple,  lock-freedom  guarantees  that  “infinitely  often  some  method  call  finishes  in 
a  finite  number  of  steps”  [10]. 

Nevertheless,  the  common  informal  or  semi-formal  definitions  of  the  progress 
properties  are  difficult  to  use  in  a  modular  and  layered  program  verification  be¬ 
cause  they  fail  to  describe  how  the  progress  properties  affect  clients.  In  a  modular 
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verification  of  client  threads,  the  concrete  implementation  7T  of  the  object  meth¬ 
ods  should  be  replaced  by  an  abstraction  (or  specification)  Ua  that  consists  of 
equivalent  atomic  methods.  The  progress  properties  should  then  characterize 
whether  and  how  the  behaviors  of  a  client  program  will  be  affected  if  a  client 
uses  n  instead  of  11  a-  In  particular,  we  are  interested  in  systematically  study¬ 
ing  whether  the  termination  of  a  client  using  the  abstract  methods  11  a  will  be 
preserved  when  using  an  implementation  7T  with  some  progress  guarantee. 

Previous  work  on  verifying  the  safety  of  concurrent  objects  {e.g.,  [4, 13])  has 
shown  that  linearizability — a  standard  safety  criterion  for  concurrent  objects — 
and  contextual  refinement  are  equivalent.  Informally,  an  implementation  U  is 
a  contextual  refinement  of  a  (more  abstract)  implementation  11  a,  if  every  ob¬ 
servable  behavior  of  any  client  program  using  7T  can  also  be  observed  when  the 
client  uses  11  a  instead.  To  obtain  equivalence  to  linearizability,  the  observable 
behaviors  include  I/O  events  but  not  divergence  (i.e.,  non-termination).  Re¬ 
cently,  Gotsman  and  Yang  [7]  showed  that  a  client  program  that  diverges  using 
a  linearizable  and  lock-free  object  must  also  diverge  when  using  the  abstract 
operations  instead.  Their  work  reveals  a  connection  between  lock-freedom  and 
a  form  of  contextual  refinement  which  preserves  termination  as  well  as  safety 
properties.  It  is  unclear  how  other  progress  guarantees  affect  termination  of 
client  programs  and  how  they  are  related  to  contextual  refinements. 

This  paper  studies  all  five  commonly  used  progress  properties  and  their  re¬ 
lationships  to  contextual  refinements.  We  propose  a  unified  framework  in  which 
a  certain  type  of  termination-sensitive  contextual  refinement  is  equivalent  to 
linearizability  together  with  one  of  the  progress  properties.  The  idea  is  to  iden¬ 
tify  different  observable  behaviors  for  different  progress  properties.  For  example, 
for  the  contextual  refinement  for  lock-freedom  we  observe  the  divergence  of  the 
whole  program,  while  for  wait-freedom  we  also  need  to  observe  which  threads  in 
the  program  diverge.  For  lock-based  progress  properties,  e.g.,  starvation- freedom 
and  deadlock-freedom,  we  have  to  take  fair  schedulers  into  account. 

Our  paper  makes  the  following  new  contributions: 

—  We  formalize  the  definitions  of  the  five  most  common  progress  properties: 
wait-freedom,  lock-freedom,  obstruction-freedom,  starvation-freedom,  and 
deadlock-freedom.  Our  formulation  is  based  on  possibly  infinite  event  traces 
that  are  operationally  generated  by  any  client  using  the  object. 

—  Based  on  our  formalization,  we  prove  relationships  between  the  progress 
properties.  For  example,  wait-freedom  implies  lock-freedom  and  starvation- 
freedom  implies  deadlock-freedom.  These  relationships  form  a  lattice  shown 
in  Figure  1  (where  the  arrows  represent  implications).  We  close  the  lattice 
with  a  bottom  element  that  we  call  sequential  termination,  a  progress  prop¬ 
erty  in  the  sequential  setting.  It  is  weaker  than  any  other  progress  property. 

—  We  develop  a  unified  framework  to  characterize  progress  properties  via  con¬ 
textual  refinements.  With  linearizability,  each  progress  property  is  proved 
equivalent  to  a  contextual  refinement  which  takes  into  account  divergence  of 
programs.  The  formal  proofs  of  our  results  can  be  found  in  Appendix  B. 
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Wait-freedom 


Lock- freedom  Starvation-freedom 


Obstruction-freedom  Deadlock-freedom 


Sequential  termination 


Fig.  1:  Relationships  between  Progress  Properties 

By  extending  earlier  equivalence  results  on  linearizability  [4],  our  contextual 
refinement  framework  can  serve  as  a  new  alternative  definition  for  the  full  cor¬ 
rectness  properties  of  concurrent  objects.  The  contextual  refinement  implied  by 
linearizability  and  a  progress  guarantee  precisely  characterizes  the  properties  at 
the  abstract  level  that  are  preserved  by  the  object  implementation.  When  prov¬ 
ing  these  properties  of  a  client  of  the  object,  we  can  soundly  replace  the  concrete 
method  implementations  by  its  abstract  operations.  On  the  other  hand,  since  the 
contextual  refinement  also  implies  linearizability  and  the  progress  property,  we 
can  potentially  borrow  ideas  from  existing  proof  methods  for  contextual  refine¬ 
ments,  such  as  simulations  {e.g.,  [14])  and  logical  relations  {e.g.,  [2]),  to  verify 
linearizability  and  the  progress  guarantee  together. 

In  the  remainder  of  this  paper,  we  first  informally  explain  our  framework 
in  Section  2.  We  then  introduce  the  formal  setting  in  Section  3;  including  the 
definition  of  linearizability  as  the  safety  criterion  of  objects.  We  formulate  the 
progress  properties  in  Section  4  and  the  contextual  refinement  framework  in 
Section  5.  We  discuss  related  work  and  conclude  in  Section  6. 

2  Informal  Account 

In  this  section,  we  informally  describe  our  results.  We  first  give  an  overview  of 
linearizability  and  its  equivalence  to  the  basic  contextual  refinement.  Then  we 
explain  the  progress  properties  and  summarize  our  new  equivalence  results. 

Linearizability  and  Contextual  Refinement.  Linearizability  is  a  standard 
safety  criterion  for  concurrent  objects  [10].  Intuitively,  linearizability  describes 
atomic  behaviors  of  object  implementations.  It  requires  that  each  method  call 
should  appear  to  take  effect  instantaneously  at  some  moment  between  its  invo¬ 
cation  and  return. 

Linearizability  intuitively  establishes  a  correspondence  between  the  object 
implementation  il  and  the  intended  atomic  operations  11  a  ■  This  correspondence 
can  also  be  understood  as  a  contextual  refinement.  Informally,  we  say  that  7T  is  a 
contextual  refinement  of  11  a,  LI  C  11  a,  if  substituting  7T  for  11  a  in  any  context 
(i.e.,  in  a  client  program)  does  not  add  observable  behaviors.  External  observers 
cannot  tell  that  11  a  has  been  replaced  by  77  from  monitoring  the  behaviors  of 
the  client  program. 
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It  has  been  proved  [4, 13]  that  linearizability  is  equivalent  to  a  contextual 
refinement  in  which  the  observable  behaviors  are  finite  traces  of  I/O  events.  Thus 
this  basic  contextual  refinement  can  be  used  to  distinguish  linearizable  objects 
from  non-linearizable  ones.  But  it  cannot  characterize  progress  properties  of 
objects. 

Progress  Properties.  Figure  2  shows  several  implementations  of  a  counter 
with  different  progress  guarantees  that  we  study  in  this  paper.  A  counter  object 
provides  the  two  methods  inc  and  dec  for  incrementing  and  decrementing  a 
shared  variable  x.  The  implementations  given  here  are  not  intended  to  be  prac¬ 
tical  but  merely  to  demonstrate  the  meanings  of  the  progress  properties.  We 
assume  that  every  command  is  executed  atomically. 

Informally,  an  object  implementation  is  wait-free,  if  it  guarantees  that  every 
thread  can  complete  any  started  operation  of  the  data  structure  in  a  finite  num¬ 
ber  of  steps  [8].  Figure  2(a)  shows  an  ideal  wait-free  implementation  in  which  the 
increment  and  the  decrement  are  done  atomically.  This  implementation  is  obvi¬ 
ously  wait-free  since  it  guarantees  termination  of  every  method  call  regardless  of 
interference  from  other  threads.  Note  that  realistic  implementations  of  wait-free 
counters  are  more  complex  and  involve  arrays  and  atomic  snapshots  [I]. 

Lock-freedom  is  similar  to  wait-freedom  but  only  guarantees  that  some  thread 
will  complete  an  operation  in  a  finite  number  of  steps  [8] .  Typical  lock-free  imple¬ 
mentations  (such  as  the  well-known  Treiber  stack,  HSY  elimination-backoff  stack 
and  Harris-Michael  lock-free  list)  use  the  atomic  compare-and-swap  instruction 
cas  in  a  loop  to  repeatedly  attempt  an  update  until  it  succeeds.  Figure  2(b) 
shows  such  an  implementation  of  the  counter  object.  It  is  lock-free,  because 
whenever  inc  and  dec  operations  are  executed  concurrently,  there  always  exists 
some  successful  update.  Note  that  this  object  is  not  wait-free.  For  the  following 
program  (2.1),  the  cas  instruction  in  the  method  called  by  the  left  thread  may 
continuously  fail  due  to  the  continuous  updates  of  x  made  by  the  right  thread. 

inc();  II  while  (true)  inc();  (2.1) 

Herlihy  et  al.  [9]  propose  obstruction-freedom  which  “guarantees  progress 
for  any  thread  that  eventually  executes  in  isolation”  (i.e.,  without  other  active 
threads  in  the  system).  They  present  two  double-ended  queues  as  examples.  In 
Figure  2(c)  we  show  an  obstruction- free  counter  that  may  look  contrived  but 
nevertheless  illustrates  the  idea  of  the  progress  property. 

The  implementation  introduces  a  variable  i,  and  lets  inc  perform  the  atomic 
increment  after  increasing  i  to  10  and  dec  do  the  atomic  decrement  after  decreas¬ 
ing  i  to  0.  Whenever  a  method  is  executed  in  isolation  {i.e.,  without  interference 
from  other  threads),  it  will  complete.  Thus  the  object  is  obstruction- free.  It  is 
not  lock-free,  because  for  the  client 

incO  ;  II  decO  ;  (2.2) 

which  executes  an  increment  and  a  decrement  concurrently,  it  is  possible  that 
neither  of  the  method  calls  returns.  For  instance,  under  a  specific  schedule,  every 
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1  incO  ■[  X  :=  X  +  1;  } 

2  decO  ■[  X  :=  X  -  1;  } 

(a)  Wait-Free  (Ideal)  Impl. 

1  incO  { 

2  local  t,  b; 

3  do  { 

4  t  :=  x; 

5  b  :=  cas (&x,t ,t+l) ; 

6  }  while ( !b) ; 

7  > 

(b)  Lock-Free  Impl. 


1  incO  { 

2  while  (i  <  10)  { 

3  i  :=  i  +  1; 

4  } 

5  X  :  =  X  +  1 ; 

6  } 

7  decO  { 

8  while  (i  >  0)  { 

9  i  :=  i  -  1; 

10  } 

11  X  :=  X  -  1; 

12  } 

(c)  Obstruction-Free  Impl. 


1  incO  { 

2  TestAndSet_lockO  ; 

3  X  :  =  X  +  1 ; 

4  TestAndSet_unlockO  ; 

5  }■ 

(d)  Deadlock-Free  Impl. 

1  incO  i 

2  Bakery_lockO  ; 

3  X  :  =  X  +  1 ; 

4  Bakery_unlockO  ; 

5  y 

(e)  Starvation-Free  Impl. 


Fig.  2  :  Counter  Objects  with  Methods  inc  and  dec 


increment  over  i  made  by  the  left  thread  is  immediately  followed  by  a  decrement 
from  the  right  thread. 

Wait-freedom,  lock-freedom,  and  obstruction-freedom  are  progress  properties 
for  non-blocking  implementations,  where  a  delay  of  a  thread  cannot  prevent  other 
threads  from  making  progress.  In  contrast,  deadlock-freedom  and  starvation- 
freedom  are  progress  properties  for  lock-based  implementations.  A  delay  of  a 
thread  holding  a  lock  will  block  other  threads  which  request  the  lock. 

Deadlock-freedom  and  starvation-freedom  are  often  defined  in  terms  of  locks 
and  critical  sections.  Deadlock-freedom  guarantees  that  some  thread  will  succeed 
in  acquiring  the  lock,  and  starvation-freedom  states  that  every  thread  attempting 
to  acquire  the  lock  will  eventually  succeed  [10].  For  example,  a  test-and-set  spin 
lock  is  deadlock-free  but  not  starvation-free.  In  a  concurrent  access,  some  thread 
will  successfully  set  the  bit  and  get  the  lock,  but  there  might  be  a  thread  that 
is  continuously  failing  to  get  the  lock.  Lamport’s  bakery  lock  is  starvation-free. 
It  ensures  that  threads  can  acquire  locks  in  the  order  of  their  requests. 

However,  as  noted  by  Herlihy  and  Shavit  [II],  the  above  definitions  based  on 
locks  are  unsatisfactory,  because  it  is  often  difficult  to  identify  a  particular  field 
in  the  object  as  a  lock.  Instead,  they  suggest  defining  them  in  terms  of  method 
calls.  They  also  notice  that  the  above  definitions  implicitly  assume  that  every 
thread  acquiring  the  lock  will  eventually  release  it.  This  assumption  requires  fair 
scheduling,  i.e.,  every  thread  gets  eventually  executed. 

Following  Herlihy  and  Shavit  [II],  we  say  an  object  is  deadlock-free,  if  in 
each  fair  execution  there  always  exists  some  method  call  that  can  finish.  As 
an  example  in  Figure  2(d),  we  use  a  test-and-set  lock  to  synchronize  the  incre¬ 
ments  of  the  counter.  Since  some  thread  is  guaranteed  to  acquire  the  test-and-set 
lock,  the  method  call  of  that  thread  is  guaranteed  to  finish.  Thus  the  object  is 
deadlock-free.  Similarly,  a  starvation-free  object  guarantees  that  every  method 
call  can  finish  in  fair  executions.  Figure  2(e)  shows  a  counter  implemented  with 
Lamport’s  bakery  lock.  It  is  starvation- free  since  the  bakery  lock  ensures  that 
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Wait-Free 

Lock-Free 

Obstruction-Free 

Deadlock-Free 

Starvation-Free 

77a 

(t,  Div.) 

Div. 

Div. 

Div. 

(t,  Div.) 

77 

(t,  Div.) 

Div. 

Div.  if  Isolating 

Div.  if  Fair 

(t,  Div.)  if  Fair 

Table  1:  Characterizing  Progress  Properties  via  Contextual  Refinements  77  C  Ua 


every  thread  can  acquire  the  lock  and  hence  every  method  call  can  eventually 
complete. 


Our  Results.  None  of  the  above  definitions  of  the  five  progress  properties 
describes  their  guarantees  regarding  the  behaviors  of  client  code.  In  this  paper, 
we  define  several  contextual  refinements  to  characterize  the  effects  over  client 
behaviors  when  the  client  uses  objects  with  some  progress  properties.  We  show 
that  linearizability  together  with  a  progress  property  is  equivalent  to  a  certain 
termination-sensitive  contextual  refinement.  Table  1  summarizes  our  results. 

For  each  progress  property,  the  new  contextual  refinement  U  C  Ua  is  de¬ 
fined  with  respect  to  a  divergence  behavior  and/or  a  specific  scheduling  at  the 
implementation  level  (the  third  row  in  Table  1)  and  at  the  abstract  side  (the 
second  row),  in  addition  to  the  I/O  events  in  the  basic  contextual  refinement  for 
linearizability. 

—  For  wait-freedom,  we  need  to  observe  the  divergence  of  each  individual  thread 
t,  represented  by  “(t,  Div.)”  in  Table  1,  at  both  the  concrete  and  the  abstract 
levels.  We  show  that,  if  the  thread  t  of  a  client  program  diverges  when  the 
client  uses  a  linearizable  and  wait-free  object  7T,  then  thread  t  must  also 
diverge  when  using  11  a  instead. 

—  The  case  for  lock-freedom  is  similar,  except  that  we  now  consider  the  diver¬ 
gence  behaviors  of  the  whole  client  program  rather  than  individual  threads 
(denoted  by  “Div.”  in  Table  1).  If  a  client  diverges  when  using  a  linearizable 
and  lock-free  object  7T,  it  must  also  diverge  when  it  uses  11  a  instead. 

—  For  obstruction-freedom,  we  consider  the  behaviors  of  isolating  executions 
at  the  concrete  side  (denoted  by  “Div.  if  Isolating”  in  Table  1).  In  those 
executions,  eventually  only  one  thread  is  running.  We  show  that,  if  a  client 
diverges  in  an  isolating  execution  when  it  uses  a  linearizable  and  obstruction- 
free  object  7T,  it  must  also  diverge  in  some  abstract  execution. 

—  For  deadlock-freedom,  we  only  care  about  fair  executions  at  the  concrete 
level  (denoted  by  “Div.  if  Fair”  in  Table  1). 

—  For  starvation-freedom,  we  observe  the  divergence  of  each  individual  thread 
at  both  levels  and  restrict  our  considerations  to  fair  executions  for  the  con¬ 
crete  side  (“(t,  Div.)  if  Fair”  in  Table  1).  Any  thread  using  7T  can  diverge  in 
a  fair  execution,  only  if  it  also  diverges  in  some  abstract  execution. 

These  new  contextual  refinements  can  characterize  linearizable  objects  with 
progress  properties.  We  will  formalize  the  results  and  give  examples  in  Section  5. 
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(Expr) 

E 

X  1  n 

1  E  +  E  1  ... 

(BExp) 

B 

true  | 

false  1  E  =  E  \  \B  \  ... 

(Instr) 

c 

::=  x:=E  \  X  :=  [E]  \  [E]  :=  E  \  print(F) 

1  X  ~  cons(F, . . . ,  E)  1  dispose(A)  |  ... 

{Stmt) 

C 

::=  skip  | 

1  end  1 

c  1  a;  :=  f{E)  \  return  E  \  fret(n)  |  noret 
(C)  1  C;C  1  if  (B)  C  else  C  \  while  {B){C} 

(Prog) 

W 

skip  1 

let  il  in  C||...  II  C 

(ODecl) 

n 

::=  {/i 

(a;i,Ci),  (xn,C„)} 

Fig.  3:  Syntax  of  the  Programming  Language 

3  Formal  Setting  and  Linearizability 

In  this  section,  we  formalize  linearizability  and  show  its  equivalence  to  a  contex¬ 
tual  rehnement  that  preserves  safety  properties.  This  equivalence  is  the  basis  of 
our  new  results  that  relate  contextual  refinement  and  progress  properties. 


Language  and  Semantics  We  use  a  similar  language  as  in  previous  work  of 
Liang  and  Feng  [13].  As  shown  in  Figure  3,  a  program  W  consists  of  several 
client  threads  that  run  in  parallel.  Each  thread  could  call  the  methods  declared 
in  the  object  il.  A  method  /  is  defined  as  a  pair  (x,  C),  where  x  is  the  formal 
argument  and  C  is  the  method  body.  We  write  /  ^  (cc,  C).  The  object  11  could 
be  either  concrete  with  hne-grained  code  that  we  want  to  verify,  or  abstract 
(usually  denoted  as  11  a  in  the  following)  that  we  consider  as  the  specification. 
For  the  latter  case,  each  method  body  should  be  an  atomic  operation  of  the  form 
{C)  and  it  should  be  always  safe  to  execute  it.  For  simplicity,  we  assume  there 
is  only  one  object  in  the  program  W  and  each  method  takes  one  argument  only. 
However,  it  is  easy  to  extend  our  work  to  multiple  objects  and  arguments. 

We  use  the  command  noret  at  the  end  of  methods  that  terminate  but  do 
not  execute  return  E.  It  is  automatically  appended  to  the  method  code  and 
is  not  supposed  to  be  used  by  programmers.  The  command  return  E  will  first 
calculate  the  return  value  n  and  reduce  to  fret(n),  another  runtime  command 
automatically  generated  during  executions.  We  separate  the  evaluation  of  E  from 
returning  its  value  n  to  the  client,  to  allow  interference  between  the  two  steps. 
Note  that  the  atomic  block  {C)  may  contain  the  command  return  E.  In  that 
case,  (C)  would  also  reduce  to  fret(n). 

To  discuss  progress  properties  later,  we  introduce  an  auxiliary  command  end. 
It  is  a  special  marker  that  can  be  added  at  the  end  of  a  thread,  but  should  not 
be  used  directly  by  programmers.  Other  commands  are  mostly  standard.  Clients 
can  use  print  (A)  to  produce  observable  external  events.  We  do  not  allow  the 
object’s  methods  to  produce  external  events.  To  simplify  the  semantics,  we  also 
assume  there  are  no  nested  method  calls. 

Figure  4  defines  program  states  and  event  traces.  We  partition  a  global  state 
S  into  the  client  memory  CTc,  the  object  CTq,  and  a  thread  pool  1C.  A  client  can 
only  access  the  client  memory  CTc,  unless  it  calls  object  methods.  The  thread  pool 
maps  each  thread  ID  t  to  its  local  call  stack  frame.  A  call  stack  n  could  be  either 


250 


(ThrdlD) 

t 

€  Nat 

(Evt)  e 

■■=  {i,  f,n)  \  {t,  ret,  n) 

(Mem) 

(J 

€  (PVarU  Nat)  Int 

1  (t,  obj)  1  (t,  obj,  abort) 

(CallStk) 

Hi 

■■■■=  icn,x,C)  1  o 

1  (t,  out,  n)  1  (t,  clt) 

( ThrdPool) 

K, 

::=  {ti-^Ki, .  .  . 

1  (t,  clt,  abort)  |  (t,  term) 

)P  State) 

S 

.. —  (c'c,C'o,/C) 

1  (spawn,  n) 

{L  State) 

s 

::=  {ac,ao,K) 

{ETrace)  T 

::=  e  \  e::T  (co-inductive) 

Fig.  4:  States  and  Event  Traces 


empty  (o)  when  the  thread  is  not  executing  a  method,  or  a  triple  (cr;,  x,  C),  where 
CT;  maps  the  method’s  formal  argument  and  local  variables  to  their  values,  x  is 
the  caller’s  variable  to  receive  the  return  value,  and  C  is  the  caller’s  remaining 
code  to  be  executed  after  the  method  returns.  To  give  a  thread-local  semantics, 
we  also  define  the  thread  local  view  s  of  the  state  that  only  includes  one  call 
stack. 

Figure  5  contains  selected  rules  of  the  operational  semantics.  To  describe 
the  operational  semantics  for  threads,  we  use  an  execution  context  E,  where 
E  ::=  [  ]  I  E;  C.  The  execution  of  code  occurs  in  the  hole  [  ].  The  context  E[C'] 
results  from  placing  C  into  the  hole. 

We  have  three  kinds  of  transitions.  We  write  {W,  S)  {W ,  S')  for  the  top- 
level  program  transitions  and  (C,  s)  -^t,n  (C'^  s')  for  the  transitions  of  thread  t 
with  the  object  77.  We  also  introduce  the  local  transition  (C,  cr)  — >t  (C",ct')  to 
describe  a  step  inside  or  outside  method  calls  of  concurrent  objects.  It  accesses 
only  object  memory  and  method  local  variables  (for  the  case  inside  method  calls), 
or  only  client  memory  (for  the  other  case).  We  then  lift  a  local  transition  to  a 
thread  transition  that  produces  an  event  (t,  obj)  or  (t,  clt).  All  three  transitions 
also  include  steps  that  lead  to  the  error  state  abort. 

We  define  all  the  generated  events  e  in  Figure  4.  A  method  invocation 
event  (t, /,  n)  is  produced  when  thread  t  executes  x  :=  f{E),  where  the  ar¬ 
gument  E's  value  is  n.  A  return  (t,ret,n)  is  produced  with  the  return  value 
n.  print(7?)  generates  an  output  (t,out,n),  and  end  generates  a  termination 
marker  (t,term).  Other  steps  generate  either  normal  object  actions  (t,obj)  (for 
steps  inside  method  calls)  or  silent  client  actions  (t,  clt)  (for  client  steps  other 
than  print(7?)).  For  transitions  leading  to  the  error  state  abort,  fault  events 
are  produced:  (t,  obj,  abort)  by  the  object  method  code  and  (t,  clt,  abort)  by 
the  client  code.  We  also  introduce  an  auxiliary  event  (spawn,  n)  to  represent 
spawning  n  threads.  It  is  automatically  inserted  at  the  beginning  of  a  generated 
event  trace,  according  to  the  total  number  of  threads  in  the  program.^  Note  that 
in  this  paper,  we  follow  Herlihy  and  Wing  [12]  and  model  dynamic  thread  cre¬ 
ation  by  simply  treating  each  child  thread  as  an  additional  thread  that  executes 
no  operations  before  being  created.  Outputs  and  faults  are  observable  events. 
We  write  tid(e)  for  the  thread  ID  in  the  event  e.  The  predicate  is_clt(e)  states 

®  The  spawning  event  (spawn,  n)  is  newly  introduced  in  this  TR.  It  helps  to  hide  the 
parameter  of  the  total  number  of  threads  in  the  fairness  definition  in  the  submitted 
version,  and  to  formulate  the  alternative  definitions  of  progress  properties. 
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(Ci,  (ctcCTo,  A:(i)))  - Yi^n  (C'i,  (cr',CT',K'))) 


(let  77  in  C'i  || . . .  Ci . . .  ||  C7„,  (a^,  7C))  (let  77  in  Ci  || . . .  C' . . .  ||  C„,  (a' ,  /C{i k'})) 

(a)  Program  Transitions 

n{f)  =  {y,C)  |7;|<^„=n  X  G  dom{ac)  k  =  ({«/ -^  n},  x,  E[  skip]) 

(E[x  :=  /(E)  I,  (ac,ao,o))  (C;noret,  ((Tc,(To,k)) 

/  0  dom{n)  or  |E|o-e  undefined  or  x  ^  dom{ac) 

(E[x  :=  /(E)  ],  (o-c,  (Jo,  o))  (t.cit.abort)^^^^  abort 


K  =  {cri,X,  C)  CTc  =  crc{x  n} 

(t,ret,n) 


(fret(n),  (ctcCTo.k))  (C,  (ct' ,  (Jo,  o))  (end,  s)  -  (skip,  s) 

_ m..  =  n _ 

(E[print(E)  ],  (ctc,  cto,  o))  (E[skip],  (ctc,  CTo,  o)) 

(C,  (Jo  tbi  (Ti)  — >t  (C',  (Jo  ttl  a'l)  dom{ai)  =  dom{a'i) 

(C,  ((7c,(Jo,  ((Ji,X,Cc)))  (E',  (cTc,  (Jo,  {a'i,X,Cc))) 

(C,  (Jo  ttl  (Ji)  — >t  abort  (C,  (Jo)  — >t  (C',  cr') 


(C,  ((Jo,  (Jo,  ((Ji,x,Co)))  ^  abort  (C,  ((Jo,(Jo,o))  (E',  ((Jo,(Jo,o)) 

(b)  Thread  Transitions 
lElo  =  n 


(E[ return  E],(j)  — >t  (fret(n),  (j)  (noret,(j)  — >t  abort 

(C,  (j)  — >t(skip,  (j')  (E,  (j)  — >  *  (fret(n),  (j')  (E,  (j) — >*  abort 


(E[  (C)  ],  (j)  — >t  (E[skip],CT')  (E[  (C)  ],  cr)  — >t  (fret(n),  a')  (E[  (C)  ],(j)  — >t  abort 

(c)  Local  Thread  Transitions 
Fig.  5  :  Selected  Rules  of  Operational  Semantics 
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that  the  event  e  is  either  a  silent  client  action,  an  output,  or  a  client  fault.  We 
write  is_inv(e)  and  is_ret(e)  to  denote  that  the  event  e  is  a  method  invocation 
and  a  return,  respectively.  The  predicate  is_res(e)  denotes  a  return  or  an  object 
fault,  and  is_abt(e)  denotes  a  fault  of  the  object  or  the  client.  Other  predicates 
are  similar  and  summarized  below. 

—  is_inv(e)  iff  there  exist  t,  /  and  n  such  that  e  =  (t,  /,  n); 

—  is_ret(e)  iff  there  exist  t  and  n'  such  that  e  =  (t,ret,n'); 

—  is_obj_abt(e)  iff  there  exists  t  such  that  e  =  (t,  obj,  abort); 

—  is_res(e)  iff  either  is_ret(e)  or  is_obj_abt(e)  holds; 

—  is_obj(e)  iff  either  e  =  (_,  obj)  or  isJnv(e)  or  is_res(e)  holds; 

—  is_clt_abt(e)  iff  there  exists  t  such  that  e  =  (t,  clt,  abort); 

—  is_abt(e)  iff  either  is_obj_abt(e)  or  is_clt_abt(e)  holds; 

—  is_clt(e)  iff  there  exists  t  and  n  such  that  either  e  =  (t,  clt)  or  e  =  (t,  out,  n) 
or  e  =  (t,  clt,  abort)  holds. 

An  event  trace  T  is  a  hnite  or  infinite  sequence  of  events.  We  write  T{i) 
for  the  f-th  event  of  T.  last(T)  is  the  last  event  in  a  finite  T.  The  trace  T(l..i) 
is  the  sub-trace  r(l), . . .  ,T{i)  of  T,  and  |r|  is  the  length  of  T  (|r|  =  w  if  T 
is  infinite).  The  trace  T\t  represents  the  sub-trace  of  T  consisting  of  all  events 
whose  thread  ID  is  t.  We  generate  event  traces  from  executions  in  Figure  6. 
We  write  T\W,  {ac,  Uo)]  for  the  prefix-closed  set  of  finite  traces  produced  by  the 
executions  of  W  with  the  initial  client  memory  CTc,  the  object  CTo,  and  empty 
call  stacks  for  all  threads.  Similarly,  we  write  7)),  JVF,  (ctc,  Co)!  for  the  finite  or 
infinite  event  traces  produced  by  complete  executions.  In  the  definitions,  we  use 

T 

the  notation  _  i — >  *  _  for  zero  or  multiple-step  program  transitions  the  generate 

T 

the  trace  T.  Similarly,  _  i — “  •  denotes  the  existence  of  an  infinite  T-labelled 
execution.  Note  that  by  using  [ITJ,  end  is  automatically  appended  at  the  end 
of  each  thread  in  W  to  explicitly  mark  the  termination  of  a  thread.  Using  [TJ  vv, 
we  insert  the  spawning  event  (spawn,  n)  at  the  beginning  of  T,  where  n  is  the 
total  number  of  threads  in  W .  Then  we  could  use  tnum(T)  to  get  the  number 
of  threads  in  the  program  that  generates  T.  Figure  6  also  shows  various  ways  to 
get  histories  and  observable  behaviors  of  a  program,  which  we  will  explain  later. 


Linearizability  and  Basic  Contextual  Refinement  Linearizability  [12]  is 
defined  using  histories.  Histories  are  special  event  traces  only  consisting  of  method 
invocation,  method  return,  and  object  faults. 

We  say  a  response  62  matches  an  invocation  Ci,  denoted  as  match(ei,  62),  iff 
they  have  the  same  thread  ID. 

match(ei,  62)  isJnv(ei)  A  is_res(e2)  A  (tid(ei)  =  tid(e2)) 

A  history  T  is  sequential,  i.e.,  seq(T),  iff  the  first  event  of  T  is  an  invocation,  and 
each  invocation,  except  possibly  the  last,  is  immediately  followed  by  a  matching 
response.  It  is  inductively  defined  as  follows. 

isjnv(e)  match(ei,  62)  seq(T) 

seq(e)  seq(e  ::  e)  seq(ei  ::  62  ::  T) 
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{[T\w\  JW',S'.{[W\,{a,,ao,®))^*{W',S') 

V  {[W],  (cTc,  (To,®))  abort} 

%,lW,{(To,(To)}=  {lT\w\  i\_W\,{(To,ao,®))^^-  V(LVKJ,(a„a„,®))^*(skip,.) 

V  {[W],  ((7c,  (To,®))  abort} 

[let  i7  in  Cl  II ...  II  C„J  let  77  in  (Ci;  end)  || . . .  ||  (C^;  end) 

Lrj(iet  77  in  Ci||...||c„)  '=  (spawn, n)::T  tnum((spawn,  n)  ::T)  ='^  n 

®  {ti'^o,...,t„'x^o}  div.tids(T)  =  {t|  (|(T|t)|  =07)} 

iso(T)  iff  |r|  =  07  3t,i.  (Vj.  j  >i  =®  tid(r(j))  =  t) 

fair(T)  iff  |T|  =  07  Vt  G  [l..tnum(T)].  |(r|t)|  =  07  V  last(r|t)  =  (t,  term) 

nlW,{ao,ao)l  =  {get.hist(T)  |  T  G  TIW,  (ao,  ao)}} 

OiW,  {(To,  ao)l  =  {get.obsv(T)  |  T  e  TfW,  {ao,  (To)}} 

C>ti.c|IVK,  ((7c,  (7o)l  ='  {(get_obsv(T),  div_tids(T))  |  T  €  7L|W^,  ((7c,  (7o)l  } 

Oo,lW,{ao,ao)j  =  {get.obsv(r)  |  T  G  Tc. [W",  ((7c,  (7c)l  } 

0,o,lW,{ao,ao)}  =  {get.obsv(r)  |  T  G  ((7c, (7^)1  A  iso(T)} 

OjoolW,{ao,ao)}  =  {get.obsv(r)  |  3n.  T  G  Tc. [1^,  ((7c,  (7c)l  A  fair(r)} 

Fig.  6:  Generation  of  Event  Traces 

Then  T  is  well-formed  iff,  for  all  t,  T|t  is  sequential. 

welLformed(T)  =^  Vt.  seq(T|t)  . 

T  is  complete  iff  it  is  well-formed  and  every  invocation  has  a  matching  response. 
An  invocation  is  pending  if  no  matching  response  follows  it.  We  write  pend_inv(T) 
for  the  set  of  pending  invocations  in  T. 

pend  Jnv(T)  {e  |  3i.  e  =  T{i)  A  isjnv(e)  A  (Vj.  i<j  <  |T|  =>  -imatch(e,  T{j)))} 

We  handle  pending  invocations  in  an  incomplete  history  T  following  the  stan¬ 
dard  linearizability  definition  [12]:  We  append  zero  or  more  return  events  to  T, 
and  drop  the  remaining  pending  invocations.  Then  we  get  a  set  of  complete  his¬ 
tories,  which  is  denoted  by  completions(T).  Formally,  we  define  completions(T) 
as  follows. 

Definition  1  (Extensions  of  a  history).  extensions(T)  is  a  set  of  well-formed 
histories  where  we  extend  T  by  appending  successful  return  events: 

welLformed(T)  T' G  extensions(T)  is_ret(e)  welLformed(r' :: e) 

T  G  extensions(r)  T'::e  G  extensions(r) 

Or  equivalently, 

extensions(T)  {T' |  welLformed(T')A  3rofc.  r'=r::Tofc  A  Vi.  is_ret(rofc(i))}. 


254 


Definition  2  (Completions  of  a  history).  truncate(T)  is  the  maximal  com¬ 
plete  sub-history  of  T,  which  is  inductively  defined  by  dropping  the  pending  in¬ 
vocations  in  T: 


truncate(e)  =  e 


/  m\ 

truncate(e ::  i  )  = 


J  e;:truncate(r) 
(  truncate(r) 


i/is_res(e)  or  3j.  match(e,  r(i)) 
otherwise 


Then  completions(T)  =  {truncate(T')  |  T'  G  extensions(r)}  .  It’s  a  set  of 
histories  without  pending  invocations. 


Then  we  can  formulate  the  linearizability  relation  between  well-formed  his¬ 
tories,  which  is  a  core  notion  used  in  the  linearizability  definition  of  an  object. 

Definition  3  (Linearizable  Histories).  T  T'  iff 

1.  Vt.  T|t  =  r It; 

2.  there  exists  a  bijection  tt  :  {1, . . . ,  |T|}  — >  {1, . . . ,  |r'|}  .such  that  \/i.  T{i)  = 

T'(7r(i))  and  'ii,j.  i  <  j  A  is_ret(r(i))  A  isJnv(T(j))  7r(i)  <  7r(j). 

That  is,  T  is  linearizable  w.r.t.  T'  if  the  latter  is  a  permutation  of  the  former, 
preserving  the  order  of  events  in  the  same  threads  and  the  order  of  the  non¬ 
overlapping  method  calls.  Then  an  object  is  linearizable  iff  all  its  concurrent 
histories  after  completions  are  linearizable  w.r.t.  some  legal  sequential  histories. 
We  use  IIa  >  {(Ta,T')  to  mean  that  T'  is  a  legal  sequential  history  generated  by 
any  client  using  the  specification  IIa  with  an  initial  abstract  object  Ua- 

nAt>iaa,T)  = 

3n,  Cl, . . .  ,C„,ac.  T  G  H|(let  IIa  in  Ci  || . . .  ||C„),  {crc,aa)l  A  seq(r) 

As  defined  in  Figure  6,  we  use  (ctc,  CTo)!  to  generate  histories  from  W, 

where  get_hist(r)  projects  the  event  trace  T  to  the  sub-history. 

Definition  4  (Linearizability  of  Objects).  The  object’s  implementation  11  is 
linearizable  w.r.t.  IIa  under  a  refinement  mapping  (p,  denoted  by  11  IIa  iff 
Vn,  Cl, ... ,  C„,Gc,(Jo,(Ja,T.  T  G  H|(let  77  in  Ci  || . . .  ||  C„),  (crc,ao)|  A  {pioo)  =  Go) 
3rc,  T' .  Tc  G  completions(r)  A  77a  I>  {Ga,  T')  A  Tc  T' 

Here  the  mapping  p  relates  concrete  objects  to  abstract  ones: 

{Ref Map)  (p  G  Mem  AbsObj 

The  side  condition  (p{Go)  =  0  in  the  above  definition  requires  the  initial  concrete 
object  Go  to  be  a  well-formed  data  structure  representing  a  valid  object  9. 

Next  we  define  a  contextual  refinement  between  the  concrete  object  and  its 
specification,  which  is  equivalent  to  linearizability.  Informally,  this  contextual 
refinement  states  that  for  any  set  of  client  threads,  the  program  W  has  no  more 
observable  behaviors  than  the  corresponding  abstract  program.  Below  we  use 
Cl|bF,  (ctc,  CTo)]  to  represent  the  set  of  observable  event  traces  generated  during 
the  executions  of  W  with  the  initial  state  (gcGo)  (and  empty  stacks).  It  is 
defined  similarly  as  "HllT,  (uc,  CTo)|  in  Figure  6,  but  now  the  traces  consist  of 
observable  events  only  (outputs,  client  faults  or  object  faults). 
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Definition.  An  object  U  satisfies  P  under  a  refinement  mapping  ip,  P,p{n),  iff 
yn,Ci,...,Cn,S,T.T  eTu^liletnin  Ci||. . -HCn),  51  A  (5€  dom(v9))  ^  P(T) . 


%iw,sj  =  {(spawn,  |VK|)::T  | 

(LVFJ,5)^“-  V(LW"J,5)^*(skip,.)  V  (LVkJ,5)^*abort} 
[iet  i7  in  Cl  II ...  II  CnJ  let  U  in  (Ci;  end)  || . . .  ||  (C„;  end) 

|let  71  in  Cl  II  ...  II  C„|  n  tnum((spawn,  n)  ::T)  n 

pendJnv(r)  {e  |  3*.  e  =  T{i)  A  isjnv(e)  A  -’3j.  {j>iA  match(e,  r(ji)))} 
prog-t(T)  iff  Vi,  e.  e  €  pendJnv(T(l..i))  =►  3j.  j  >  i  A  match(e,  r(ji)) 
prog-s(r)  iff  Vi,  e.  e  €  pendJnv(T(l..i))  =>  3j.  j  >  i  A  is_ret(r(j)) 
abt(T)  iff  3i.  is_abt(r(i)) 

sched(r)  iff  |T|  =  oj  A  pend Jnv(T)  /  0  =>  3e.  e  €  pendJnv(T)  A  |(r|t]ci(e))|  =  ic 
fair(T)  iff  |r|  =  ca  =>  Vt  G  [l..tnum(T)].  |(r|t)|  =  a;  V  last(r|t)  =  (t,  term) 
iso(T)  iff  |r|=ca  =>  3t,  i.  (Vj.  j  >  i  =>  tid(T(j))  =  t) 

wait-free  iff  sched  prog-t  V  abt  starvation-free  iff  fair  prog-t  V  abt 

lock-free  iff  sched  prog-s  V  abt  deadlock-free  iff  fair  prog-s  V  abt 

obstruction-free  iff  sched  A  iso  prog-t  V  abt 

Fig.  7:  Formalizing  Progress  Properties 

Definition  5  (Basic  Contextual  Refinement).  7T  Q^p  11  a  iff 

Vn,Cl,...,Cn,rrc,rro,f7a.  {ip{(To^  —  Ua) 

C>I(let  77  in  Ci  || . . .  ||  C„),  (ua,  Ua)]  C  C|(let  77a  in  Ci  || . . .  ||  C„),  (a^,  <7^)1 . 

Following  Filipovic  et  al.  [4],  we  can  prove  that  linearizability  is  equivalent 
to  this  contextual  refinement.  We  give  the  proofs  in  Appendix  B.l. 

Theorem  1  (Basic  Equivalence).  7T  :<^  11  a  n  \—^p  11  a- 

Theorem  1  allows  us  to  use  7T  ^^p  11  a  to  identify  linearizable  objects.  However, 
we  cannot  use  it  to  observe  progress  properties  of  objects.  For  the  following 
example,  7T  11  a  holds  although  no  concrete  method  call  of  f  could  finish  (we 

assume  this  object  contains  a  method  f  only). 

77(f)  :  while(true)  skip;  77A(f)  :  skip;  C  :  print(l);  f();  print(l); 

The  reason  is  that  U  Ua  considers  a  prefix-dosed  set  of  event  traces  at  the 
abstract  side.  For  the  above  client  C,  the  observable  behaviors  of  let  7T  in  C 
can  all  be  found  in  the  prefix-closed  set  of  behaviors  produced  by  let  11  a  in  C. 

4  Formalizing  Progress  Properties 

We  define  progress  in  Figure  7  as  properties  over  both  event  traces  T  and  object 
implementations  77.  We  say  an  object  implementation  77  has  a  progress  property 
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lock-free  wait-free  V  prog-s  starvation-free  wait-free  V -ifair 

obstruction-free  lock-free  V  “liso  deadlock-free  lock-free  V  -ifair 

Fig.  8:  Relationships  between  Progress  Properties 

P  iff  all  its  event  traces  have  the  property.  Here  we  use  Tuj  to  generate  the  event 
traces.  Its  definition  in  Figure  7  is  similar  to  T|VF,  5]  of  Figure  ??,  but  fW,  <S]| 
is  for  the  set  of  finite  or  infinite  event  traces  produced  by  complete  executions. 
We  use  (IF,  5)  i — •  to  denote  the  existence  of  a  T-labelled  infinite  execution. 

(IF,  5)  I — f  *  (skip,  _)  represents  a  terminating  execution  that  produces  T.  By 
using  [IFJ,  we  append  end  at  the  end  of  each  thread  to  explicitly  mark  the 
termination  of  the  thread.  We  also  insert  the  spawning  event  (spawn,  n)  at  the 
beginning  of  T,  where  n  is  the  number  of  threads  in  IF.  Then  we  can  use  tnum(T) 
to  get  the  number  n,  which  is  needed  to  define  fairness,  as  shown  below. 

Before  formulating  each  progress  property  over  event  traces,  we  first  define 
some  auxiliary  properties  in  Figure  7.  prog-t(T)  guarantees  that  every  method 
call  in  T  eventually  finishes.  prog-s(T)  guarantees  that  some  pending  method 
call  finishes.  Different  from  prog-t,  the  return  event  T(j)  in  prog-s  does  not  have 
to  be  a  matching  return  of  the  pending  invocation  e.  abt(T)  says  that  T  ends 
with  a  fault  event. 

There  are  three  useful  conditions  on  scheduling.  The  basic  requirement  for 
a  good  schedule  is  sched.  If  T  is  infinite  and  there  exist  pending  calls,  then  at 
least  one  pending  thread  should  be  scheduled  infinitely  often.  In  fact,  there  are 
two  possible  reasons  causing  a  method  call  of  thread  t  to  pend.  Either  t  is  no 
longer  scheduled,  or  it  is  always  scheduled  but  the  method  call  never  finishes, 
sched  rules  out  the  bad  schedule  where  no  thread  with  an  invoked  method  is 
active.  For  instance,  the  following  infinite  trace  does  not  satisfy  sched. 


(ti,  /i,  ni) ::  (t2,  /2,  n2) ::  (ti,  obj) ::  (ts,  clt) ::  (ts,  clt) ::  (ta,  clt) :: 


If  T  is  infinite,  fair(T)  requires  every  non-terminating  thread  be  scheduled  in¬ 
finitely  often;  and  iso(T)  requires  eventually  only  one  thread  be  scheduled.  We 
can  see  that  a  fair  schedule  is  a  good  schedule  satisfying  sched. 

At  the  bottom  of  Figure  7  we  define  the  progress  properties  formally.  We 
omit  the  parameter  T  in  the  formulae  to  simplify  the  presentation.  An  event 
trace  T  is  wait-free  {i.e.,  wait-free(T)  holds)  if  under  the  good  schedule  sched,  it 
guarantees  prog-t  unless  it  ends  with  a  fault,  lock-free(r)  is  similar  except  that 
it  guarantees  prog-s.  Starvation-freedom  and  deadlock-freedom  guarantee  prog-t 
and  prog-s  under  fair  scheduling.  Obstruction- freedom  guarantees  prog-t  if  some 
pending  thread  is  always  scheduled  (sched)  and  runs  in  isolation  (iso). 

Figure  8  contains  lemmas  that  relate  progress  properties.  For  instance,  an 
event  trace  is  starvation-free,  iff  it  is  wait-free  or  not  fair.  These  lemmas  give  us 
the  relationship  lattice  in  Figure  1.  To  close  the  lattice,  we  also  define  a  progress 
property  in  the  sequential  setting.  Sequential  termination  guarantees  that  every 
method  call  must  finish  in  a  trace  produced  by  a  sequential  client.  The  formal 
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div.tids(r)  "=*  {t|  (|(r|OI  =  c.)} 

0^lW,Sj  =  {get.obsv(T)  I  Te%,lW,S]} 

0,^lW,S}  =  {get.obsv(r)  I  T  eTolW,SjA\so{T)} 

Of^lW,S]  =  {get.obsv(T)  I  T€  7L[W",51  Afair(T)} 

Otu,lW,S\  =  {(get.obsv(r),div.tids(r))  I  TeTu,lW,S\} 

Oftu.lW,S]  =  {(get.obsv(T),div.tids(T))  |  T  G  A  fair(r)} 

Fig.  9:  Generation  of  Complete  Event  Traces 

definition  is  given  in  the  companion  TR  [15],  and  we  prove  that  it  is  implied  by 
each  of  the  five  progress  properties  for  concurrent  objects. 

5  Equivalence  to  Contextual  Refinements 

We  extend  the  basic  contextual  refinement  in  Definition  5  to  observe  progress 
as  well  as  linearizability.  For  each  progress  property,  we  carefully  choose  the 
observable  behaviors  at  the  concrete  and  the  abstract  levels. 

5.1  Observable  Behaviors 

In  Figure  9,  we  define  various  observable  behaviors  for  the  termination-sensitive 
contextual  refinements. 

We  use  Ocj|lF,  5]  to  represent  the  set  of  observable  event  traces  produced 
by  complete  executions  of  (IF,  5).  Recall  that  get_obsv(T)  gets  the  sub-trace 
of  T  consisting  of  all  the  observable  events  only.  Unlike  the  prefix-closed  set 
0|1F,  iSjj,  this  definition  utilizes  7[j|lF,  <S]|  (see  Figure  7)  whose  event  traces  are 
all  complete  and  could  be  infinite.  Thus  it  allows  us  to  observe  divergence  of  the 
whole  program.  and  Of^j  take  the  complete  observable  traces  of  isolating 
and  fair  executions  respectively.  Here  iso(T)  and  fair(T)  are  defined  in  Figure  7. 

We  could  also  observe  divergence  of  individual  threads  rather  than  the  whole 
program.  We  define  div_tids(T)  to  collect  the  set  of  threads  that  diverge  in  the 
trace  T.  Then  we  write  Otui\W,  <S]|  to  get  both  the  observable  behaviors  and  the 
diverging  threads  in  the  complete  executions.  Oftui\W,  5]  is  defined  similarly  but 
considers  fair  executions  only. 

More  on  divergence.  In  general,  divergence  means  non-termination.  For  example, 
we  could  say  that  the  following  two-threaded  program  (5.1)  must  diverge  since 
it  never  terminates. 


X  :=  X  +  1;  II  while  (true)  skip;  (5.1) 

But  for  individual  threads,  divergence  is  not  equivalent  to  non-termination,  since 
a  non-terminating  thread  may  either  have  an  infinite  execution  or  simply  be  not 
scheduled  from  some  point  due  to  unfair  scheduling.  We  view  only  the  former 
case  as  divergence.  For  instance,  in  an  unfair  execution,  the  left  thread  of  (5.1) 
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p 

wait-free 

lock-free 

obstruction-free 

deadlock-free 

starvation-free 

n  M  ^A 

Otu  C  Otu, 

Ou,  C  Ou, 

Oiu,  c  Ou, 

Ofcj  c  Ouj 

Oftu,  c  Ot.. 

Table  2:  Contextual  Refinements  11  IZ^  Ua  for  Progress  Properties  P 

may  never  be  scheduled  and  hence  it  has  no  chance  to  terminate.  It  does  not 
diverge.  Similarly,  for  the  following  program  (5.2), 

while  (true)  skip;  ||  while  (true)  skip;  (5.2) 

the  whole  program  must  diverge,  but  it  is  possible  that  a  single  thread  does  not 
diverge  in  an  execution. 

5.2  New  Contextual  Refinements  and  Equivalence  Results 

In  Table  2,  we  summarize  the  definitions  of  the  termination-sensitive  contextual 
refinements.  Each  new  contextual  refinement  follows  the  basic  one  in  Definition  5 
but  takes  different  observable  behaviors  as  specified  in  Table  2.  For  example,  the 
contextual  refinement  for  wait-freedom  is  formally  defined  as  follows: 

n  Ha  iff  (Vn.Ci, . .  .,Cr.,S,Sa.  MS)  =  Sa) 

OtMletnin  C'i||...||C'„),5)l  CCt,,[(letJ7Ain  Ci  || . . .  ||  Cn),  ). 

Theorem  2  says  that  linearizability  with  a  progress  property  P  together  is  equiv¬ 
alent  to  the  corresponding  contextual  refinement 

Theorem  2  (Equivalence).  U  :<^  11  a  AP<^(i7)  7T  Ha,  where  P  is 
wait-free,  lock-free,  obstruction-free,  deadlock-free  or  starvation-free. 

Here  we  assume  the  object  specification  11  a  is  total,  i.e.,  the  abstract  operations 
never  block.  We  provide  the  full  proofs  of  our  equivalence  results  in  Appendix  B. 

The  contextual  refinement  for  wait-freedom  takes  Otu;  at  both  the  concrete 
and  the  abstract  levels.  The  divergence  of  individual  threads  as  well  as  I/O 
events  are  treated  as  observable  behaviors.  The  intuition  of  the  equivalence  is  as 
follows.  Since  a  wait-free  object  77  guarantees  that  every  method  call  finishes, 
we  have  to  blame  the  client  code  itself  for  the  divergence  of  a  thread  using  77. 
That  is,  even  if  the  thread  uses  the  abstract  object  11  a,  it  must  still  diverge. 

As  an  example,  consider  the  client  program  (2.1).  Intuitively,  for  any  execu¬ 
tion  in  which  the  client  uses  the  abstract  operations,  only  the  right  thread  t2 
diverges.  Thus  Otu:  of  the  abstract  program  is  a  singleton  set  {(e,  {t2})}.  When 
the  client  uses  the  wait-free  object  in  Figure  2(a),  its  Otu:  set  is  still  {(e,  {t2})}. 
It  does  not  prodnce  more  observable  behaviors.  But  if  it  uses  a  non-wait-free 
object  (such  as  the  one  in  Figure  2(b)),  the  left  thread  ti  does  not  necessarily 
finish.  The  Otu:  set  becomes  {(e,  {t2}),  (e,  {ti,t2})}.  It  produces  more  observable 
behaviors  than  the  abstract  client,  breaking  the  contextual  refinement.  Thanks 
to  observing  div_tids  that  collects  the  diverging  threads,  we  can  rule  out  non¬ 
wait-free  objects  which  may  cause  more  threads  to  diverge. 

77  jj^  takes  coarser  observable  behaviors.  We  observe  the  divergence 

of  the  whole  client  program  by  using  at  both  the  concrete  and  the  abstract 
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levels.  Intuitively,  a  lock-free  object  U  ensures  that  some  method  call  will  finish, 
thus  the  client  using  77  diverges  only  if  there  are  an  infinite  number  of  method 
calls.  Then  it  must  also  diverge  when  using  the  abstract  object  11  a- 

For  example,  consider  the  client  (2.1).  The  whole  client  program  diverges  in 
every  execution  both  when  it  uses  the  lock-free  object  in  Figure  2(b)  and  when 
it  uses  the  abstract  one.  The  set  of  observable  behaviors  is  {e}  at  both  levels. 
On  the  other  hand,  the  following  client  must  terminate  and  print  out  both  1  and 
2  in  every  execution.  The  set  is  {l::2::e,2::l::e}  at  both  levels. 


incO;  print(l);  ||  dec();  print(2);  (5-3) 


Instead,  if  the  client  (5.3)  uses  the  non-lock-free  object  in  Figure  2(c),  it  may 
diverge  and  nothing  is  printed  out.  The  set  becomes  {e,  1 ::  2  ::  e,  2  ::  1 ::  e}, 
which  contains  more  behaviors  than  the  abstract  side.  Thus  77  cj^ck-free  jj^  fails. 

Obstruction-freedom  ensures  progress  for  isolating  executions  in  which  even¬ 
tually  only  one  thread  is  running.  Correspondingly,  77  □obstruction-free  jj^  restricts 
our  considerations  to  isolating  executions.  It  takes  Oicu  at  the  concrete  level  and 
Ouj  at  the  abstract  level. 

To  understand  the  equivalence,  consider  the  client  (5.3)  again.  For  isolating 
executions  with  the  obstruction-free  object  in  Figure  2(c),  it  must  terminate  and 
print  out  both  1  and  2.  The  set  at  the  concrete  level  is  {I ::  2 ::  e,  2 ::  1 ::  e},  the 
same  as  the  set  of  the  abstract  side.  Non-obstruction-free  objects  in  general 
do  not  guarantee  progress  for  some  isolating  executions.  If  the  client  uses  the 
object  in  Figure  2(d)  or  (e),  the  Oi^  set  is  {e,  1 ::  2  ::  e,  2  ::  I ::  e},  not  a  subset  of 
the  abstract  O^j  set.  The  undesired  empty  observable  trace  is  produced  by  unfair 
executions,  where  a  thread  acquires  the  lock  and  gets  suspended  and  then  the 
other  thread  would  keep  requesting  the  lock  forever  (it  is  executed  in  isolation) . 

n  izbeadiock-free  jj^  Concrete  side,  ruling  out  undesired  di¬ 

vergence  caused  by  unfair  scheduling.  For  the  client  (5.3)  with  the  object  in 
Figure  2(d)  or  (e),  its  Op  set  is  same  as  the  set  Ou:  at  the  abstract  level. 

For  77  □^arvation-free  g^jjj  cousider  Only  fair  executions  at  the  concrete 

level  (similar  to  deadlock- freedom),  but  observe  the  divergence  of  individual 
threads  rather  than  the  whole  program  (similar  to  wait- freedom) .  It  uses  Oftuj 
at  the  concrete  side  and  Ot^  at  the  abstract  level.  For  the  client  (5.3)  with  the 
starvation- free  object  in  Figure  2(e),  no  thread  diverges  in  any  fair  execution. 
Then  the  set  Oftuj  of  observable  behaviors  is  {(I ::  2 ::  e,  0),  (2 ::  I ::  e,  0)},  which  is 
same  as  the  set  Otu:  at  the  abstract  level. 

Observing  threaded  divergence  allows  us  to  distinguish  starvation-free  objects 
from  deadlock-free  objects.  Consider  the  client  (2.1).  Under  fair  scheduling,  we 
know  only  the  right  thread  t2  would  diverge  when  using  the  starvation-free  ob¬ 
ject  in  Figure  2(e).  The  set  Oftu,  is  {(e,  {t2})}.  It  coincides  with  the  abstract 
behaviors  Otui-  But  when  using  the  deadlock-free  object  of  Figure  2(d),  the  Oft^j 
set  becomes  {(e,  {t2}),  (e,  {ti,t2})},  breaking  the  contextual  refinement. 
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6  Related  Work  and  Conclusion 


There  is  a  large  body  of  work  discussing  the  five  progress  properties  and  the  con¬ 
textual  refinements  individually.  Our  work  in  contrast  studies  their  relationships, 
which  have  not  been  considered  much  before. 

Gotsman  and  Yang  [7]  propose  a  new  linearizability  definition  that  preserves 
lock-freedom,  and  suggest  a  connection  between  lock-freedom  and  a  termination- 
sensitive  contextual  refinement.  We  do  not  redefine  linearizability  here.  Instead, 
we  propose  a  unified  framework  to  systematically  relate  all  the  five  progress 
properties  plus  linearizability  to  various  contextual  refinements. 

Herlihy  and  Shavit  [11]  informally  discuss  all  the  five  progress  properties. 
Our  definitions  in  Section  4  mostly  follow  their  explanations,  but  they  are  more 
formal  and  close  the  gap  between  program  semantics  and  their  history-based 
interpretations.  We  also  notice  that  their  obstruction-freedom  is  inappropriate 
for  some  examples  (see  TR  [15]),  and  propose  a  different  definition  that  is  closer 
to  the  common  intuition  [10].  In  addition,  we  relate  the  progress  properties  to 
contextual  refinements,  which  consider  the  extensional  effects  on  client  behaviors. 

Fossati  et  al.  [5]  propose  a  uniform  approach  in  the  7r-calculus  to  formulate 
both  the  standard  progress  properties  and  their  observational  approximations. 
Their  technical  setting  is  completely  different  from  ours.  Also,  their  observational 
approximations  for  lock-freedom  and  wait-freedom  are  strictly  weaker  than  the 
standard  notions.  Their  deadlock-freedom  and  starvation-freedom  are  not  formu¬ 
lated,  and  there  is  no  observational  approximation  given  for  obstruction-freedom. 
In  comparison,  our  framework  relates  each  of  the  five  progress  properties  (plus 
linearizablity)  to  an  equivalent  contextual  refinement. 

There  are  also  formulations  of  progress  properties  based  on  temporal  logics. 
For  example,  Petrank  et  al.  [16]  formalize  the  three  non-blocking  properties  and 
Dongol  [3]  formalize  all  the  five  progress  properties,  using  linear  temporal  logics. 
Those  formulations  make  it  easier  to  do  model  checking  {e.g.,  Petrank  et  al.  [16] 
also  build  a  tool  to  model  check  a  variant  of  lock- freedom),  while  our  contextual 
refinement  framework  is  potentially  helpful  for  modular  Hoare-style  verification. 

Conclusion.  We  have  introduced  a  contextual  refinement  framework  to  unify 
various  progress  properties.  For  linearizable  objects,  each  progress  property  is 
equivalent  to  a  specific  termination-sensitive  contextual  refinement,  as  summa¬ 
rized  in  Table  1.  The  framework  allows  us  to  verify  safety  and  liveness  properties 
of  client  programs  at  a  high  abstraction  level  by  replacing  concrete  method  im¬ 
plementations  with  abstract  operations.  It  also  makes  it  possible  to  borrow  ideas 
from  existing  proof  methods  for  contextual  refinements  to  verify  linearizability 
and  a  progress  property  together,  which  we  leave  as  future  work. 
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A  Comparisons  with  Herlihy  and  Shavit ’s 
Obstruction- Freedom 

Herlihy  and  Shavit  [11]  define  obstruction- freedom  using  the  notion  of  uniformly 
isolating  executions.  A  trace  is  uniformly  isolating,  if  “for  every  fc  >  0,  any  thread 
that  takes  an  infinite  number  of  steps  has  an  interval  where  it  takes  at  least  k 
concrete  contiguous  steps”  [11].  Then,  their  obstruction-free  object  guarantees 
wait-freedom  for  every  uniformly  isolating  execution.  They  also  propose  a  new 
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Fig.  10:  Execution  of  f  ()  ||g()  in  Example  1 


progress  property,  clash-freedom,  which  guarantees  lock-freedom  for  uniformly- 
isolating  executions. 

Below  we  give  an  example  showing  that  their  definition  is  inconsistent  with 
the  common  intuition  of  obstruction-freedom. 


Example  1.  The  object  implementation  uses  three  shared  variables:  x,  a  and  b. 
It  provides  two  methods  f  and  g: 


fO  { 

while  (a  <=  x  <=  b)  { 

X++; 

a—; 

> 

} 


gO  f 

while  (a  <=  x  <=  b)  { 

X—; 

b++ ; 

} 

} 


We  can  see  that,  if  f  0  or  g()  is  eventually  executed  in  isolation  (ie.,  we  suspend 
all  but  one  threads),  it  must  returns.  Thus  intuitively  this  object  should  be 
obstruction- free.  It  also  satisfies  our  formulation  (Definitions  ??  and  ??). 

However,  we  could  construct  an  execution  which  is  uniformly  isolating  but 
is  not  lock- free  or  wait- free.  Consider  the  client  program  f  ()  ||  g().  It  has  an 
execution  shown  in  Figure  10.  Starting  from  x  =  0,  a  =  —1  and  b  =  1,  we 
alternatively  let  each  thread  execute  more  and  more  iterations.  Then  for  any  k, 
we  could  always  find  an  interval  of  k  iterations  for  each  thread  in  this  execution. 
Thus  the  execution  is  uniformly  isolating.  But  neither  method  call  finishes.  This 
execution  is  not  lock-free  nor  wait-free.  Thus  the  object  does  not  satisfy  Herlihy 
and  Shavit’s  obstruction- freedom  or  clash-freedom  definitions. 


B  Proofs 

In  the  following  proofs,  we  make  the  call  stacks  explicit  in  the  generation  of 
event  traces.  For  example,  we  use  H |IF,  (ctc,  cto,  @)|  instead  of  'H|W,  (ctc,  o-o)|. 
We  generalize  the  definitions  to  allow  nonempty  call  stacks  in  the  initial  state, 
e.g.,  we  can  use  H|IF,  (ctc,  cto, /C)|. 

B.l  Proofs  of  Theorem  1 

To  prove  the  theorem,  we  utilize  the  most  general  client  (MGC).  Let’s  assume 
dom{n)  =  {/i, . . . ,  fm}-  We  could  use  the  expression  rand()  to  get  a  random 
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(nondeterministic)  integer,  and  rand(m)  to  get  a  random  integer  r  G 
Then,  for  any  n,  MGC„  is  defined  as  follows: 

MGT  =  while  (true){  /rand(m) (rand());  } 

MGC„  II, MGT 

Here  each  thread  keeps  calling  a  random  method  with  a  random  argument.  We 
also  define  another  kind  of  “most  general  clients”  which  print  out  arguments  and 
return  values  for  method  calls: 

MGTp,  while  (true){ 

Xt  :=  rand();  j/t  :=  rand(m);  print(yt, a;t); 

:=  print(2t); 

} 

MGGp„  "='  ||,g[,,,„,  MGTp, 

Here  a;t,  t/t  and  Zt  are  all  local  variables  for  thread  t.  Below  we  define  the  MGC 
versions  of  “linearizability”  and  refinements,  and  prove  they  are  related  to  the 
standard  dehnitions  of  linearizability  and  contextual  rehnement. 

Definition  6.  7T  11  a  iff 

Vn,  (Jo,  (Ta,  T.  T  e  "HKlet  U  in  MGG„),  (0,  CTo,  @)|  A  =  aa) 

=>  BTc,  Ta.  Tc  €  completions(r)  A  nA>'ff^'~{cra,Ta)  A  Tc  ^lin  Ta 

where 

=  T  e  H|(let  JIa  in  MGG„),  (0,(7„®)1  A  seq(r) . 

n<g^nA  iff 

Vn,  CTo,cro.  ((p(cro)  =  aa) 

^  -HKlet  n  in  MGG„),(0,(Jo,®)l  C  H|(let  77a  in  MGG„),  (0,  a,,  ®)1  . 

The  following  lemma  shows  that  every  history  of  an  object  U  could  be  gen¬ 
erated  by  the  MGC. 

Lemma  1  (MGC  is  the  Most  General).  For  any  n,  Ci,  . . . ,  Cn,  cTc,  CTo  and 
aa,  H|(let  77  in  Cl  II . . .  ||C„),  (ctcCTo,®)!  C  77|(let  77  in  MGC„),  (0,CTo,®)1. 

Proof.  We  define  the  simulation  relation  ;;jMGC  between  a  program  and  a  MGC  in 
Figure  11(a),  and  prove  the  following  (B.l)  by  case  analysis  and  the  operational 
semantics: 

For  any  Wi,  Si,  W2,  S2  and  ei,  if  (Wi,5i)  ;^mgc  (1^2,52),  then 

(1)  if  (lTi,5i)  1-^  abort  and  is_obj_abt(ei),  then 

there  exists  T2  such  that  {W2,S2)  abort  and 
ei  =  get_hist(T2); 

(2)  if  (Wi,5i)  ^  {WfS'i),  then 

there  exist  T2,  W2  and  S2  such  that  (1^2,52)  1-^*  (Hd^,5Q, 
get_hist(ei)  =  get_hist(T2)  and  {W{,S'i)  :<mgc  (kF^.*^^)- 

(B.l) 
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(let  i7  in  Cl  II ...  II  C„,  (ctc,  (Jo,  {1  ki,  . . . ,  n  '«n})) 

^mgc  (let  J7  in  C(  II . . .  ||C(,,  (0,cro,  {1  k'i,  . . . ,  n fv(i})) 

where  Vi.  {Ci,Ki)  ;:<mgc  (C',«;') 

(C',o);:<mgc  (MGT;end,o)  (C,  (a,,  *,  C'))  ;Dmgc  (C,  (a,,  •,  (skip;  MGT;  end))) 

(a)  Program  is  Simulated  by  MGG 

(let  il  in  Cl  II ...  II  Cn,  (ctc,  Uo,  {1  ki,  . . . ,  n  «n})) 

.^MGCp  (let  77  in  C(  II ...  II  C'n,  (cr(.,cro,  {1  fti, . . . ,  n k'„})) 

where  Vi.  (Ci,  fti)  ;^mgcp  (Ci>  and  cr'  =  {a;t  i/t  Zt  "^  -  |  1  <  t  <  n} 

{C,o)  ^*mgcp  (MGTp,;end,o) 

(C,  {ai,-,C'))  ;^MGCp  {C,  (cri,2t,  (skip;  print (zt);  MGTpj;  end))) 

(b)  Program  is  Simulated  by  MGGp 

(let  77  in  Cl  II . . .  ||C„,  (ctcCTo,  {1  ki,  . . . ,  n «;„})) 

i^MGCp-  (let  77  in  C(  || . . .  ||  C(,,  (0,  Uo,  {1  k'i,  . . . ,  n k'„})) 

where  Vi.  (Ci,  Uc,  Ki)  ;^mgcp  (Ci>  ^i)  and  Uc  =  {xt  yt  Zt  ^  -  |  1  <  t  <  n} 

'  (Co,  ({a;  n},  •,  (skip;  MGT ;  end)) 

if  (C  =  E[zt  :=  /ift(a:t)]  V  C  =  E[skip;zt  :=  /si,(xt)]) 
A  ac{xt)  =  n  A  adyt)  =  i  /\n(fi)  =  (a:,  Co) 

(C,  o-c,  o)  .^MGCp-  (fret(n'),  (_,  •,  (skip;  MGT ;  end)) 

if  (C  =  E[print(zt)  ]  V  C  =  E[skip;  print(zt)  ]) 

A  CTc(zt)  =  n' 

(MGT;  end,  o)  otherwise 

(C,  (To,  (cTi,  Zt,  C'))  ;iMGCp-  (‘^’  ('^A  ■’  (®klp;  ;  ®nd))) 

(c)  MGGp  is  Simulated  by  MGG 

(let  77  in  Cl  II ...  II  Cn,  ((Tc,  Uo,  {1  ki,  . . . ,  n «»})) 

A  (let  77a  in  C(  ||...  ||C(,,  (0,(t(„{1  k'i,  . . . ,  n <}); 

let  77a  in  C"  || . . .  ||  C",  (ctc,  (To,  {1 ft"})) 
where  Vi.  (Ci,  fti)  A  (C',  ft';  C",  ft") 

and  "HUlet  77  in  Ci  || . . .  ||  Cn,  ((Tc,  (To,  {1  fti, . . . ,  n w^n})! 

C  Tijlet  77a  in  C}  II . . .  ||  C'„,  (0,(To,  {1  fti, . . . ,  n ft(i})| 

(C,  o)  (C',  o;  C,  o)  (C,  (rr,,  *,  Co))  ;A  (C',  (rr,',  a;',  Ci);  C',  (rr,',  a;,  Co)) 

(d)  Concrete  Program  is  Simulated  by  Abstract  MGG  and  Abstract  Program 

Fig.  11:  Simulations  between  Programs  and  MGC 
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With  (B.l),  we  can  prove  the  following  by  induction  over  the  number  of  steps 
generating  the  event  trace  of 

If  (LVhiJ,5i)  ;iMGC  (LW'2j,52),  then  C  ^[^2,521. 

Then,  since 

( [let  n  in  Cl  II ...  II  Cn\ ,  (ctc,  (Jo,  ©))  :<MGC  ( [let  n  in  MGC„J ,  (0,  CTo,  ©)) , 

we  are  done.  □ 

For  linearizability,  the  MGC-version  is  equivalent  to  the  original  definition. 

Lemma  2.  7T  11  a  n  Ha  ■ 

Proof.  1.  n  Ha  ^  n  Ha  : 

For  any  n,  ao,  (Ja  and  T  such  that  T  €  H|(let  77  in  MGC„),  (0,  (Tq,  ©)]  and 
p{(Jo)  =  (Ja,  from  77  Ua,  we  know  there  exist  To  and  Ta  such  that 

To  e  completions(T)  A  77^  >  {(Ja,  Ta)  A  To  ©mp  Ta  ■ 

We  only  need  to  show  that 

77^>(aa,T„)  ^  nAt>^^^{(Ja,Ta). 

First  we  know  VA  tid(ro(i))  G  [l..n].  Second,  from  Ua  >  {aa,Ta),  we  know 
there  exist  n',  Ci,  . . . ,  Cn'  and  CTc  such  that  seq(Ta)  and 

Ta  G  77|(let  Ua  in  Ci  || . . .  ||  {(Jc,  Ca,  ©)1  ■ 

If  n'  <  n,  then  we  know 

Ta  G  77|(let  Ua  in  Ci  || . . .  ||  Cn'  ||  skip  || . . .  ||  skip),  (ctc,  aa,  ®)1 . 

From  Lemma  1,  we  are  done.  Otherwise,  since  Ta  only  contains  events  of 
threads  1,  . . . ,  n,  we  know  the  threads  n  +  1,  . . . ,  n'  do  not  access  the 
object.  Similar  to  the  proof  of  Lemma  1,  we  can  construct  simulations  and 
prove  Ta  G  77|(let  11  a  in  MGC„),  (0,  aa,  ©)|.  Thus  we  are  done. 

2.  77  Ha  n  Ha: 

For  any  n,  Ci,  . . . ,  Cn,  ac,  ao,  aa  and  T  such  that  (/?((To)  =  CTo  and  T  G 
77|(let  77  in  Cl  II ...  II  C„),  (ctc,  ao,  ©)],  from  Lemma  1,  we  know 

T  G  77l(let  77  in  MGG„),  (0,  CTo,  ©)1 . 

From  77  Ua,  we  know  there  exist  Tc  and  Ta  such  that 

To  G  completions(T)  A  77aI> Ta)  A  T^  ©np  Ta  ■ 

By  definitions,  we  see 

nAt>^^^{aa,Ta)  nAt>{aa,Ta). 

Thus  we  are  done.  □ 
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Below  we  prove  an  important  lemma  which  relates  the  basic  contextual  re¬ 
finement  to  a  refinement  over  MGC  which  considers  histories  instead  of  observ¬ 
able  behaviors.  The  idea  behind  this  lemma  will  be  useful  in  proving  various 
equivalence  results,  including  those  for  progress  properties. 

Lemma  3.  11  11  a  n  ‘^(pIlA  ■ 

Proof.  1.  n  Ua  n  ‘^ipIlA  : 

We  first  prove  the  following  (a)  and  (b): 

(a)  For  any  n,  CTq,  CTc,  T, 

if  (Tc  =  {a^t  J/t  ^  -  I  1  <  t  <  fa}  and 

T  e  i«I(let  n  in  MGC„),  (0,  CTo,  @)1, 
then  there  exists  B  such  that  T  m  B  and 
B  G  C>|(let  n  in  MGCp„),  (ctc,  CTo,  ®)1, 
where 

A«e  TptB 

7^  A::r«e::B 


(t,  fi,n)  «  (t,  out,  (i,  n))  (t,  ret,  n)  «  (t,  out,  n) 


(t,  obj,  abort)  «  (t,  obj,  abort) 

Proof.  We  define  the  simulation  relation  ;^mgCp  in  Figure  11(b),  and 
prove  the  following  (B.2)  by  case  analysis  and  the  operational  semantics. 
This  simulation  ensures  that  at  the  right  side  (MGGp),  each  output  of  the 
method  argument  is  immediately  followed  by  invoking  the  method,  and 
each  method  return  is  immediately  followed  by  printing  out  the  return 
value. 

For  any  VFi,  5i,  IF2,  S2  and  ei,  if  (lFi,5i)  ;^mgCp  (1F2,52),  then 

(1)  if  (lFi,5i)  1-^  abort  and  is_obj_abt(ei),  then 

rp 

there  exists  T2  such  that  (W2,iS2)  1— abort  and 
ei  «  get_obsv(T2); 

(2)  if  (Wi,5i)  ^  (Wi',5}),  then 

there  exist  T2,  W2  and  S2  such  that  (W2,52)  1-^* 
get_hist(ei)  w  get_obsv(T2)  and  {W[,S[)  ^mgCp  {^2,82). 

(B.2) 

With  (B.2),  we  can  prove  the  following  by  induction  over  the  number  of 
steps  generating  the  event  trace  of  ?t|lFi,5i]. 

If  (LkFiJ,5i)  ®mgcp  ([1^2}, 52)  and  T  G  UlW^Sil  then 
there  exists  B  such  that  T  Ki  B  and  B  G  C1|W2, 52]. 

Then,  since 

([let  n  in  MGG„J,  (0,0-0,®))  i^MGCp  ([let  7T  in  MGCp„J ,  (ctc,  CTo,  ®)), 
we  are  done. 
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(b)  For  any  n,  aa,  B, 

if  (Tc  =  {a^t  j/t  ^  -  I  1  <  t  <  n}  and 

S€C>|(let  n  in  MGCp„),  (ctc,  (Ta,  ®)1, 
then  there  exists  T  such  that  T  k,  B  and 
T  G  i«|(let  n  in  MGC„),  (0,  ®)1. 

Proof.  We  define  the  simulation  relation  ;^MGCp-  in  Figure  11(c),  and 
prove  the  following  (B.3)  by  case  analysis  and  the  operational  semantics. 
This  simulation  ensures  two  things,  (i)  Whenever  the  left  side  (MGCp) 
prints  out  a  method  argument,  the  right  side  (MGG)  invokes  the  method 
using  that  argument,  (ii)  Whenever  the  left  side  prints  out  a  return 
value,  the  right  side  must  return  the  same  value.  We  can  ensure  (i)  and 
(ii)  because  Xt,  j/t  and  Zt  are  all  thread- local  variables. 

For  any  VFi,  5i,  IF2,  S2  and  d,  if  (lFi,5i)  ;iMGCp-  {^2,32),  then 

(1)  if  (lFi,5i)  1-^  abort,  then 

there  exists  T2  such  that  {W2,S2)  abort  and 
get_hist(T2)  Ri  ei; 

(2)  if  (Wi,5i)  ^  (W{,50,  then 

there  exist  T2,  W2  and  S2  such  that  (W2,52)  1-^*  (IF^,^^), 
get_hist(T2)  w  get_obsv(ei)  and  {W[,S[)  ®MGCp- 

(B.3) 

With  (B.3),  we  can  prove  the  following  by  induction  over  the  number  of 
steps  generating  the  event  trace  of  (!l|lFi,  <Si|. 

If  (L1FiJ,5i)  ®mgcp-  (LW^2j,52)  and  B  G  OIlFi,5il,  then  there 
exists  T  such  that  T  ^  B  and  T  G  7t|lF2,52]. 

Then,  since 


([let  n  in  MGGp„J,((Jc,(To,®))  ;^MGCp-  ((let  II  in  MGG„J,  (0,  CTo,  ®)), 
we  are  done. 

Then,  since  II  Ua,  we  know 
Vn,  ac,  (Jo,  (Ja-  {v>{(Jo)  =  <Ja) 

=>  C>|(let  n  in  MGCp„),  ((Tc,  (Jo,  ®)|  C  C>|[(let  IIa  in  MGCp„),  ((Tc,  (Jo,  ®)1  . 
Thus  from  (a)  and  (b),  we  get 

'(/n,(Jo,(Ja.  (</5(Co)  =  (To) 

-H[(let  n  in  MGC„),  (0,  (To,  ®)1  C  -H[(let  77a  in  MGC^),  (0,  (To,  ©)] . 

Then  we  are  done. 

2.  n  Q^^IIa  =®  n  IIa  ■ 

We  define  the  simulation  relation  P  in  Figure  11(d),  and  prove  the  following 
(B.4)  by  case  analysis  and  the  operational  semantics.  This  simulation  relates 
one  program  to  two  programs.  We  use  the  MGC  at  the  abstract  level  to 
help  determine  the  abstract  program  that  corresponds  to  the  concrete  one. 
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Specifically,  we  require  the  histories  generated  by  the  concrete  program  can 
also  be  generated  by  the  abstract  MGC.  Then,  when  an  abstract  thread  is  in 
a  method  call,  its  code  should  be  the  same  as  the  MGC  thread.  Otherwise, 
its  code  is  the  same  as  the  concrete  thread. 

For  any  Wi,  Si,  W2,  S2,  W3,,  S3  and  Ci, 
if  (fFi,5i)  {W2,S2;W3,S3),  then 

(1)  if  (fFi,5i)  1-^  abort,  then 

there  exists  T3  such  that  (1^3,53)  1-^*  abort  and 
ei  =  get_obsv(T3); 

(2)  if  (fTi,5i)^  (fFi',50,  then 

there  exist  T2,  T3,  W'^  and  .Sg  such  that 

{W2,S2)  {Wi,S^),  {W3,S3) 

get_obsv(ei)  =  get_obsv(r3)  and  (W[,S[)  (fF^,  bFg,  ^g). 

(B.4) 

With  (B.4),  we  can  prove  the  following  by  induction  over  the  number  of  steps 
generating  the  event  trace  of  0|hFi,5i]. 

If  (Wi,5i)  (1^2,52;  VFg,  53),  then  0[FFi,5il  C  01^3,531. 

For  any  n,  Ci,  . . . ,  Cn,  cr^,  <Jo  and  by  Lemma  1,  we  know 

"HKlet  n  in  Cl  II . . .  ||C„),  (ctcCTo,®)!  C  ?^|(let  7T  in  MGC„),  (0,(To,®)]  . 

Since  11  ^ipIlA,  we  know  if  (p{cFo)  =  Uo,  then 

H[[(let  n  in  MGC„),  (0,CTo,®)l  C  "HKlet  11  a  in  MGG„),  (0,  Ua,  ®)1  ■ 

Then  we  know 

(let  n  in  Cl  II ...  II  Cn,  (ctcCTo,  ®)) 

(let  Ua  in  MGG„,  (0,  Ua,  ®); 
let  Ua  in  Cl  II . . .  ||  C„,  (ctc,  Ua,  ®))  • 

Thus,  we  get 

e>|(let  n  in  Cl  II ...  II  Cn),  (ucCTo,®)! 

C  C|(let  Ua  in  Cl  II . . .  ||  C„),  (ctc,  Ca,  ®)1 . 

Thus  we  are  done.  □ 

Then,  we  prove  the  following  (B.5)  and  can  get  Theorem  1. 

n  g^UA  ^  n  Ua  (B.5) 

1.  ng^UA  =>  Ua: 

We  only  need  to  prove  the  following  lemma  (remember  we  assume  that  each 
Ci  in  IIa  is  of  the  form  (C)  and  it  is  always  safe  to  execute  Ua)- 

Lemma  4  {Ua  is  Linearizable).  For  any  n,  a  a  and  T, 
z/TGHKlet  Ha  in  MGG„),  (0,  ®)1, 

then  there  exist  Tc  and  Ta  sueh  that  Tc  G  completions(T),  Tc  ®iin  Ta, 

Ta  G  ?t|(let  IIa  in  MGG„),  (0,  Ua,  ®)1  andseq{Ta). 
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Proof.  We  define  a  new  operational  semantics,  in  which  we  additionally  gen¬ 
erate  two  events  at  the  single  step  of  the  method  body.  We  know  the  method 
body  in  the  execution  can  only  be  (C);  noret,  and  hence  the  resulting  code 
after  one  step  (if  not  block)  must  be  fret(n')  for  some  n'. 

((C);  noret,  Oo  W  ai)  — >t  (fret(n'),  cr(,  W  a)) 
dom{ai)  =  dom{a'i)  ai  =  {y n}  n{f)  =  {y,  (C)) 


((C);  noret,  (ctcCTo,  (ai,x,Cc))) 


>t,i7  (fret(n'),  (crc,cr(, 


{a'i,x,Cc))) 


Here  [t,  /,  n]  and  [t,  ret,  n']  are  two  new  events  (called  atom-invocation  event 
and  atom-return  event  respectively)  generated  for  the  new  semantics.  We 
use  T|[]  to  project  the  event  trace  T  to  the  new  events,  and  use  [ej  (and 
[rj)  to  transform  the  new  event  (and  the  event  trace)  to  an  old  event  (and 
a  trace  of  old  events),  where  [t,  /,  n]  is  transformed  to  (t,  /,  n)  and  [t,  ret,  n'\ 
is  transformed  to  (t,  ret,n').  Other  parts  of  the  semantics  are  the  same  as 
the  operational  semantics  in  Figure  5.  We  can  define  7[]|kF,  5]]  in  a  similar 
way  as  T|kF,  5],  which  uses  the  new  semantics  instead  of  the  original  one 
and  keeps  all  the  events  including  the  new  events. 

(1)  We  can  prove  that  there  is  a  lock-step  simulation  between  the  original 
semantics  in  Figure  5  and  the  new  semantics.  Then,  for  any  T  such  that 

T  G  T^Klet  7T^  in  MGC„),  (0,  ua,  ®)1, 

we  have  an  corresponding  execution  under  the  new  semantics  to  generate 
Tt  such  that 

TTG7[]I(let  IIa  in  MGC„),  (0,  CTo,  ®)1, 
and  get_hist(rT)  =  T. 

(2)  Below  we  show: 

If  Tt  G  7[]|(let  IJa  in  MGC„),  (0,  (Ta,®)l,  T  =  get_hist(TT)  and 

Ta=[TT\l]\, 

then  seq(ro)  and  there  exists  Tc  such  that  Tc  G  completions(T) 
and  Tc  Ta- 


Proof.  By  the  new  operational  semantics,  we  know  seq(Tc)  holds. 


Construct  Tc  and  Prove  Linearizability  Condition  1:  By  the  new  opera¬ 
tional  semantics,  we  know  that  for  any  t,  T|t  and  T^jt  must  satisfy  one 
of  the  following: 

(i)  T|t  =  Ta  It ;  or 

(ii)  3n.  T|t::(t,  ret,n)  =  Ta|t ;  or 
(hi)  3/,n.  T|t  =  Ta|t::(t,/,n) . 

We  construct  Tg  as  follows.  For  any  t,  if  it  is  the  above  case  (ii),  we  ap¬ 
pend  the  corresponding  return  event  at  the  end  of  T.  Since  welLformed(T) 
and  welLformed(Ta),  we  could  prove  welLformed(Te).  Thus  Te  G  extensions(T). 
Also  Tg  satisfies:  for  any  t,  one  of  the  following  holds: 

(i)  Tg|t  =  Ta|t ;  or 

(ii)  3/,  n.  Tg|t  =  Ta|t  ::  (t,  /,  n) . 

Let  Tc  =  truncate(Tg).  Thus  Tc  G  completions(T). 

Since  Vt.  is_res(last(Ta|t))  A  seq(Ta|t),  we  could  prove  that  for  any  t, 

(i)  if  Tg|t  =  Ta|t,  then  Tc|t  =  Tg|t; 

(ii)  if  Tg|t  =  Ta|t  ::  (t, /,n),  then  Tc|t  =  Ta|t. 

Thus  Vt.  Tc|t  =  Ta|t. 
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Prove  Linearizability  Condition  2:  We  informally  show  that  the  bijection 
TT  implicit  in  Vt.  Tc|t  =  Ta  |t  preserves  the  response-invocation  order. 

Let  Tc{i)  be  a  response  event  in  and  let  Tdj)  be  an  invocation  event. 
Then  7r(i)  and  7r(j)  are  the  indices  of  Tc{i)  and  Tc{j)  in  Ta  respectively. 
Suppose  i  <  j.  By  the  construction  of  Tc  from  T,  we  know  the  same 
response  and  invocation  events  are  in  T,  and  the  response  happens  before 
the  invocation.  Let  i'  and  j'  be  the  indices  of  these  events  in  T.  Then 
i'  <  j'.  By  the  new  operational  semantics,  we  know  in  Tt,  the  atom- 
return  event  is  before  the  atom-invocation  event  since  the  history  return 
event  is  before  the  history  invocation  event.  Thus  7r(i)  <  7r(j). 

(3)  Finally,  we  show  the  following  and  finish  the  proof  of  the  lemma: 

If  Tt  €  7[][(let  Ua  in  MGC„),  (0,  ®)1  and  T„  =  [Tt|[]J, 

then  Ta  €  ?^|(let  11  a  in  MGC„),  (0,  CTo,  0^. 

This  is  proved  by  constructing  the  following  simulation  ^new  This  sim¬ 
ulation  ensures  that  the  right  side  invokes  and  returns  from  a  method  at 
the  time  when  the  left  side  generates  the  new  atomic  events. 

(let  Ua  in  Cl  II . . .  ||  C„,  (0,  aa,  {1  Ki, . . .  ,n «:«})) 

;^new  (let  Ua  in  C(  || . . .  ||  C'„,  (0,  (Ja,  {1  ^  ft'i, . . . ,  n  ^n})) 

where  Vi.  (Ci,K,i)  new 

{C,  °)  .:)new  (C,  o) 

(((C);noret),  (cr;,  •,  (skip;  MGT)))  ;^new  ((/rand(m) (rand());  MGT),  o) 

(fret(n'),  (a;,  •,  (skip;  MGT)))  ;^new  ((skip;  MGT),  o) 

We  prove  the  following  by  case  analysis  and  the  operational  semantics. 
For  any  Wi,  Si,  W2,  S2  and  Ti, 

if  (LFi,<Si)  {W2,S2)  and  (VFi,5i)  (W[,S[)  in  the  new 
semantics, 

then  there  exist  T2,  W2  and  S'2  such  that  (IF2, 52)  1-^*  {W2,  S2), 
get_hist(T2)  =  [Ti|[]J  and  {W{,S[)  ®new 
Then  we  can  prove  the  following  by  induction  over  the  number  of  steps 
generating  the  event  trace  of  7[]  |LFi,  <Si|. 

If  (VFi,5i)  ®new  (,W2,S2),  Tt  e  7[]lVFi,5il  and  Ta  =  [TtIoJ, 
thenT,  €'^[1^2,521. 

Since  we  know 

(let  IIa  in  MGG„,  (0,  Ca,  ©))  i^new  (let  11  a  in  MGG„,  (0,  (Ta,  ©)) , 
we  are  done. 

The  lemma  is  immediate  from  the  above  (1),  (2)  and  (3).  □ 

2.  n^^^^UA  ng^nA-. 

We  only  need  to  prove  the  following  lemma  (similar  to  the  Rearrangement 
Lemma  in  [6]): 

Lemma  5  (Rearrangement).  For  any  n,  aa,  T  and  Ta, 
ifT  ©Mn  Ta,  TaG'HKlet  IIa  in  MGG„),  (0,  (Ta,  ©)]  andseq{Ta), 
thenT  G  7{|(let  IIa  in  MGC„),  (0,  (Ta,  ©)|. 
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Proof.  Suppose  |r|  =  n.  We  know  T  must  not  contain  the  abort  event.  From 
T  ^lin  Ta,  we  know 

(i)  Vt.  T|t  =  Ta|t; 

(ii)  there  exists  a  bijection  tt  :  {1, . . . ,  |T|}  — >  {1, . . . ,  |Ta|}  such  that  Vt.  T(z)  = 
Ta(7r(z))  and  Vt,  j.  i  <  j  A  is_res(T(t))  A  isJnv(r(j))  7r(t)  <  7r(j) . 

We  construct  the  execution  under  the  new  semantics  (defined  in  the  proof 
of  Lemma  4)  which  generates  T,  and  the  new  events  constitute  T^,  t.e.,  we 
want  to  show  the  following  holds: 

3Tt.  Tt  e  7[]  |(let  Ua  in  MGC„),  (0,  aa,  ®)1  A  T  =  get_hist(TT) .  (B.6) 

Then  we  prove  that  there  is  a  lock-step  simulation  between  the  new  semantics 
and  the  original  semantics  in  Figure  5,  and  we  can  get 

Tenlilet  Ha  in  MGC„),  (0,  a,,  ®)1. 

Below  we  prove  (B.6).  We  prove  that  for  any  k,  there  exist  Tt,  W ,  S'  and 
k'  such  that 

(let  Ua  in  MGG„,  (0,aa,®))  (VF',5') 

Aget.hist(TT)  =  T(l..fc)  A  [TtIqJ  =  ra(l..fc') 

A(V5".  (let  Ba  in  MGG„,  (0,  da,  ®))  (_,5") 

- ^  ‘5"|obj  =  5'|obj)  , 

where  <S%bj  get  the  object  memory  in  S' . 

By  induction  over  k. 

Base  Case:  If  fc  =  0,  trivial. 

Inductive  Step:  Suppose  there  exist  Ti,  IFi,  <Si  and  ki  such  that 

(let  Ba  in  MGG„,(0,aa,®))  A*(Wi,5i) 

Aget_hist(Ti)  =  T(l..fc)  A  [TilgJ  =  T,(l..fci) 

A(V5(.  (let  Ba  in  MGG„,  (0,  ®))  5() 

- ^  |obj  =  |obj)  , 

we  want  to  show  there  exist  T2,  W2,  S2  and  k2  such  that 
(IFi,5i)  A*(W2,52) 

Aget_hist(T2)  =  T{k  -I-  1)  A  [T2|[]J  =  Ta{ki  +  l..^) 

A(V5'.  (let  Ba  in  MGG„,  (0,  a,,  ®)) 

^  ‘^2|obj  =  ‘52|obj)  ) 

By  case  analysis. 

(a)  T{k  +  1)  =  (t,/,n'). 

Suppose  T{k  +  1)  =  (T|t)(i). 

From  T|t  =  Ta\t  and  Ta  S  "HKlet  Ba  in  MGG„),  (0,  (Ta,  ®)1,  we  know 
z  =  1  or  is_ret((T|t)(z  —  1))  holds. 

i.  If  z  =  1,  we  just  let  the  code  MGT  of  the  thread  t  executes  to  calling 
the  method  /  using  the  argument  n,  and  generates  the  event  (t,  /,  n'). 
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ii.  If  is_ret((r|t)(i  —  1))  holds,  we  know  the  code  of  the  thread  t  is  in  the 
client  code.  Still  we  can  let  it  execute  to  the  method  call  of  /  using 
the  argument  n,  generating  the  event  (t, /,  n'). 

(b)  T(fc  +  1)  =  (t, ret, n'). 

Suppose  T{k  +  1)  =  (r|t)(i).  Similar  to  the  previous  case,  we  know 
is_inv((r|t)(i  — 1))  holds.  Suppose  (T|t)(i  — 1)  =  e  =  (t,  /,  n)  and  i7^(/)  = 
(x,  (C)).  Thus  the  code  of  the  thread  t  is  either  (C');noret  or  fret(n") 
(for  some  n”). 

i.  The  code  of  t  is  (C);  noret. 

Thus  last(Tilt)  =  e.  Suppose  |r(l..A:)|t|  =  ni.  From  the  operational 
semantics  and  the  generation  of  Ti,  we  know  |To(l..fci)|t|  =  ni  —  1. 
For  the  bijection  tt  in  (ii)  which  maps  events  in  T  to  events  of  we 
let  k2  =  7r(fc  +  l).  Since  T|t  =  Ta\t,  we  know  k2  >  ki.  Let  k'  =  k2  —  ki. 
Suppose  Ta{ki  +  l.M)  =  ei  ek'-  Since  L^i|[]J  =  Ta(l..A:i), 

by  the  operational  semantics  and  the  generation  of  Ti,  we  know 
is_ret(Ta(fci)).  Since  seq(Ta),  we  know  seq(ei .  .::ek')  and  k'  =  2j. 
Suppose  the  threads  of  the  events  ei,  . . . ,  Ck’  are  ti, . . .  ,tj  respec¬ 
tively  where  tj  =  t.  Below  we  prove  that  for  any  i  such  that  1  < 
*  <  J  )  the  current  code  of  the  thread  t^  is  (Cj);  noret  (for  some 
method  body  (Q)),  and  e2i-i  =  last(T(l..fc)|tJ.  The  proof  is  by 
contradiction.  Suppose  e2i_i  =  T{i')  and  i'  >  k.  Since  T{k  -I-  1)  = 
(t,ret,n')  and  isJnv(e2i-i),  we  know  i'  >  k  +  1.  By  (ii),  we  know 
7r(i')  >  7r(fc  -f  1)  =  k2,  which  contradicts  the  fact  that  e2i-i  is  an 
event  in  Ta{ki  -f  l..fc2).  Thus,  i'  <  k,  and  since  =  Ta(l..A:i), 

by  the  operational  semantics  and  the  generation  of  Ti,  we  know 
e2i-i  =  last(Ti|tJ.  Thus  we  are  done. 

We  let  the  threads  ti,...,tj  execute  one  step  in  order,  generating 
the  event  trace  which  only  contains  the  atom-invocation  and 
atom-return  events,  and  then  the  thread  tj  execute  one  more  step 
generating  =  Ta{k2)  =  Ta(7r(fc  -|-  1))  =  T{k  -I-  1).  Since  Ta  G 
"HKlet  Uji  in  MGC„),  (0,  (Ta,  @)1,  we  know  this  execution  is  possi¬ 
ble,  and  moreover  we  have  [T2|[]J  =  =  Ta{ki  +  l..^)- 

ii.  The  code  of  t  is  fret(n"). 

Thus  last(Tilt)  =  [t, ret,n"].  Since  =  Ta{l..ki),  we  know 

last(Ta(l..A:i)|t)  =  (t,  ret,n").  Suppose  |ra(l..fci)|t|  =ni. 

From  the  operational  semantics  and  the  generation  of  Ti,  we  know 
|get_hist(Ti|t)|  =  |T(l..fc)|t|  =  m  —  1.  Since  T|t  =  Ta|t,  we  know 
n'  =  n" .  The  code  of  t  is  fret(n').  We  let  it  execute  one  step  and 
generate  the  event  (t,ret,n'). 

Thus  (B.6)  holds  and  we  are  done.  □ 

From  n  Ua,  we  know 

Vn,  ao,(Ja,T.  T  £  ?{|[(let  77  in  MGC„),  (0,  (To,  @)|  A  ((^((To)  =  (To) 

=>  3Tc,  Ta.  Tc  G  completions(T)  /\Ta  €  77|(let  77a  in  MGC„),  (0,  (t^,  O)] 

A  seq(ro)  A  Tc  ^lin  Ta 

From  Lemma  5,  we  know 
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Vn,  Go,cra,T.  T  €  H|(let  U  in  MGC„),  (0,  Go,  ®)1  A  {(p{go)  =  Go) 

3Tc.  Tc  e  completions(r)  /\Tc  £  ?^|l(let  IIa  in  MGC„),  (0,  Ga,  ®)1 

Since  Tc  €  completions(r),  we  know  there  exists  Te  such  that  Tc  =  truncate(Te) 
and  Tg  G  extensions(T).  By  the  definition  of  truncate(Te),  we  can  prove: 

Te  GHKlet  Ua  in  MGC„),  (0,  aa,  @)1 

Then,  by  the  definition  of  Tg  G  extensions(T),  we  can  prove: 

TGH[(let  Ha  in  MGC„),  (0,  CTa,  @)1 

Thus  we  get  U  ^^pIlA- 


B.2  Proofs  of  Figures  1  and  7 

Lemma  6  (Figure  7).  Assume  T  G  7(g|(let  7T  in  Ci  || . . .  ||  C„),  (ctc,  CTq,  ©)]  . 

f .  wait-free(T)  prog-t(T)  V  non-sched(T)  V  abt(T)  non-sched(T)  V 

abt(T)  ; 

2.  lock-free(T)  prog-s(T)  V  non-sched(r)  V  abt(r)  wait-free(T)  V 

prog-s(T)  ; 

3.  obstruction-free(T)  prog-t(T)  V  non-sched(T)  V  -'iso(T)  V  abt(T) 

lock-free(T)  V  -'iso(T)  ; 

deadlock-free(T)  prog-s(T)  V  -'fair(T)  V  abt(T)  lock-free(T)  V 

-'fair(T)  ; 

5.  starvation-free(T)  prog-t(T)  V  -ifair(T)  V  abt(T)  wait-free(T)  V 

-ifair(T) . 


Proof.  1.  By  definition. 

wait-free(r)  e  G  pend Jnv(r(l..i)) 

=®  (3j.  j  >  i  A  match(e,  r(j))) 

V(3i-  j  >  i  A  (Vfc  >  j.  tid(T(fe))  /  tid(e))) ) 
Vabt(T) 

(Vi,e.  e  G  pendjnv(r(l..i))  A  -'(3ji.  j  >  i  A  match(e,  T(j))) 
=>  (3i-  i  >  i  A  (Vfc  >  j.  tid(r(fc))  /  tid(e))) ) 

Vabt(T) 

( Ve.  e  G  pendjnv(r)  {3j.  Vfc  >  j.  tid(r(fc))  A  tid(e)) ) 
Vabt(T) 

<;=>  non-sched(T)  V  abt(r) 


Also,  we  can  prove  prog-t(r)  =>  non-sched(T)  as  follows. 

prog-t(r)  ( Vi,  e.  e  G  pendjnv(r(l..i))  =>  3ji.  j  >  i  A  match(e,  T(j)) ) 
( Vi,  e.  e  G  pendjnv(r(l..i))  e  ^  pendjnv(r) ) 

(  pend  Jnv(T)  =  0  ) 

=>  non-sched(T) 
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2.  We  only  need  to  prove  the  first  equivalence.  The  second  is  trivial  from  the 
first  one. 

lock-free(T)  <;=>  (Vi,e.  e  €  pend Jnv(T(l..i)) 

=>  (By  j  >iA  is_ret(T0'))) 

V(3i.  j  >  iA{Vk>  j.  is_clt(T(fc)))) ) 

Vabt(T) 

prog-s(T) 

V  (  3j.  Mk  >  j.  is_clt(r(fc)) ) 

Vabt(T) 

From  3j.  Vfc  >  j.  is_clt(T(fc))  and  the  operational  semantics  generating  T, 
we  know  non-sched(r)  holds. 

If  non-sched(T)  holds,  we  know  there  exists  j  such  that  Vfc  >  j.  tid(T(fc))  ^ 
tid(pend_inv(T)),  where  tid(pend_inv(T))  gets  the  set  of  thread  IDs  of  the 
pending  invocations  in  T.  Then  by  the  operational  semantics  and  the  gen¬ 
eration  of  T,  we  know  either  3j.  Vfc  >  j.  is_clt(T(fc))  or  prog-s(T)  holds. 

3.  For  obstruction-freedom,  we  only  need  to  prove  the  following: 

(1)  VT.  iso(r)  A  obstruction-free(r)  =4^  wait-free(r) ; 

(2)  VT.  wait-free(T)  =>  obstruction-free(r) ; 

(3)  VT.  -iiso(T)  =4^  obstruction-free(T) ; 

(4)  Vn,  C'i,...,C'„,crc,o-o,T. 

T  G  7I;|(let  n  in  Cl  II ...  II  C„),  (ctcCTo,©)!  A  prog-s(T) 

=J>  obstruction-free(T) . 

For  (I)  VT.  iso(T)  A  obstruction-free(T)  =>  wait-free(T) : 

Proof.  By  obstruction-free(r),  we  know  one  of  the  following  holds: 

(a)  there  exists  i  such  that  is_abt(T(i))  holds;  or 

(b)  for  any  i  and  e,  if  e  G  pend_inv(T(l..i)),  then  one  of  the  following  holds: 

(i)  there  exists  j  >  i  such  that  match(e, T(j));  or 

(ii)  Vj  >  i.  3k.  k  >  j  A  t\d{T{k))  tid(e). 

For  (a),  we  know  wait-free(r). 

For  (b),  for  any  i  and  e,  where  e  G  pend_inv(T(l..i)),  we  let  t  =  tid(e).  Since 
iso(T),  we  know 

|T|  yf  w  V  31,  i.  (Vj.  j  >i  ^  tid(T(j))  =  t) . 

If  |r|  yf  00,  we  know  (ii)  cannot  hold.  Thus  (i)  must  hold. 

Otherwise,  we  know  there  exists  to  and  ig  such  that 

Vj.  j  >io  tid(r(j))  =  to  . 

If  to  =  t,  we  know  (ii)  does  not  hold,  and  hence  (i)  holds.  Otherwise,  if  to  yf  t, 
we  know 

V/c.  k  >  ig  =>  tid(T(fc))  yf  tid(e) . 

Thus  we  know  wait-free(T). 

For  (2)  VT.  wait-free(T)  =4^  obstruction-free(T) : 
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Proof.  From  wait-free(T),  we  know  one  of  the  following  holds: 

(i)  there  exists  i  such  that  is_abt(T(i))  holds;  or 

(ii)  for  any  i  and  e,  if  e  €  pend_inv(T(l.d)),  then  one  of  the  following  holds: 

(1)  there  exists  j  >  i  such  that  Vfc  >  j.  tid(r(A:))  tid(e);  or 

(2)  there  exists  j  >  i  such  that  match(e,  r(j)). 

For  (i),  we  know  obstruction-free(r)  holds. 

For  (ii),  for  any  i  and  e,  where  e  G  pend_inv(r(l..i)),  if  (1)  holds,  we  know 
Vj  >  i.  3k.  k  >  j  A  tid(T(fc))  ^  tid(e). 

Thus  we  know  obstruction-free(T). 

For  (3)  VT.  -iiso(T)  =>  obstruction-free(r) : 

Proof.  From  -iiso(T),  we  know 

|T|  =  w  A  Vt,  i.  3j.  j  >iA  tid(T(j))  ^  t . 

Thus,  for  any  i  and  e,  where  e  G  pend_inv(T(l..i)),  we  know 
Vj.  3k.  k  >  j  A  t\d{T{k))  ^  tid(e) . 

Thus  we  have  proved  obstruction-free(r). 

For  (4)  Vn,  Cl, . . .  ,Cn,crc,o-o,T.  T  G  7I;|(let  77  in  Ci  || . . .  ||C„),  (ctc,  CTo,  @)lA 
prog-s(T)  obstruction-free(T) : 

Proof.  From  prog-s(T),  we  know:  for  any  i,  if  pend_inv(r(l..i))  0,  then 
there  exists  j  >  i  such  that  is_ret(T(j)). 

If  |T|  7^  00,  by  Lemma  17,  we  know  obstruction-free(T)  hold.  Otherwise, 
|T|  =  oo.  For  any  i  and  e  such  that  e  G  pend  Jnv(T(l..i)),  we  know  one  of  the 
following  must  hold: 

(1)  there  exists  j  >  i  such  that  match(e,  T(j));  or 

(2)  Vj.  j  >  -■match(e,T(j)) . 

For  (2),  we  know 

Vj.  j  >  i  ^  e  G  pend_inv(T(l..j)) . 

Thus  we  have 


Vj.  j  >  i  ^  3k.  k  >  j  A  is_ret(T(fc)) . 

Then  we  know 

Vj  >  i.  3k.  k  >  j  A  is_ret(r(A:))  A  tid(r(A:))  ^  tid(e) . 

Thus  we  know  obstruction-free(T). 

4.  The  first  equivalence  is  trivial  from  definition.  For  the  second  equivalence, 
we  only  need  to  prove  the  following: 

non-sched(T)  A  -'prog-s(T)  =4'  -ifair(T) . 

From  the  proof  of  the  equivalences  for  wait-freedom,  we  know 


276 


(pend_inv(T)  =  0)  prog-t(T). 

Thus  we  only  need  to  prove  the  following. 

(1)  non-sched(r)  A  (pend_inv(T)  ^  0)  -'fair(r) ; 

(2)  prog-t(T)  prog-s(T). 

For  (1),  from  the  premises,  we  know 

3e,  A  e  €  pend_inv(T)  A  Vj  >  i.  tid(r(j))  ^  tid(e) . 

Thus  from  the  operational  semantics  and  the  generation  of  T,  we  know 

|T|  =  w  A  3t  G  [l..tnum(r)].  |(T|t)|  ^  uj  A  last(r|t)  ^  (t,term) . 

Thus  -ifair(T)  holds. 

(2)  is  trivial  from  definition. 

5.  The  first  equivalence  is  trivial  from  definition.  For  the  second  equivalence, 
we  only  need  to  prove  the  following: 

non-sched(r)  A -iprog-t(r)  =>  -ifair(T) . 

It  has  been  proved  in  the  proofs  for  the  equivalences  for  deadlock-freedom. 

□ 

From  Lemma  6,  we  can  get  most  of  the  implications  in  the  lattice  of  Figure  1. 
To  prove  the  remaining  implications  on  sequential  termination,  we  first  prove 
some  equivalences  in  the  sequential  setting  below. 

Lemma  7  (Equivalences  in  Sequential  Setting).  For  any  Ci,  ac,  Uo  and 
T,  ifT  G  7LI(let  n  in  Ci),  (ctc,  CTo,  @)1,  then 

1.  fair(r)  and\so{T)  holds; 

2.  lock-free(r)  wait-free(T)  obstruction-free(r)  deadlock-free(T) 

starvation-free(r)  . 

Proof.  1.  Since  T  G  7^|(let  77  in  Ci),  (ctc,  ctq,  @)|,  by  the  operational  seman¬ 
tics  we  know  T(l)  =  (spawn,  1)  and 

Vi.2<i<  \T\  tid(T(f))  =  1. 

If  \T\  =  uj,  we  know  |(r|i)|  =  \T\  =  uj.  Thus  fair(T)  and  iso(T). 

2.  By  Lemma  6  and  the  above  case.  □ 

From  Lemmas  6  and  7,  we  get  the  following  theorem. 

Theorem  3  (Figure  1). 

7.  wait-free,^(7T)  lock-free,^ (77)  ; 

2.  wait-free,^(7T)  starvation-free<^(7T)  ; 

3.  lock-free<^(77)  =A  obstruction-free,^ (77)  ; 

4.  lock-free<p(77)  =>  dead  lock- free,^  (77)  ; 

5.  starvation-free,^(7T)  =>  deadlock-free<^(77)  ; 

6.  obstruction-free,^ (77)  =>  seq-term,^(77)  ; 

7.  dead  lock-free,^  (77)  =A  seq-term,^(77) . 
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B.3  Proofs  of  Theorem  ?? 


Lemma  8  (Finite  trace  must  be  lock- free).  For  any  T ,  if 

T  e7I;|(let  n  in  Ci  || . . .  ||  Cn),  (ctc,  CTo,  ®)1 
and  |T|  ^  uj,  then  lock-free(r)  must  hold. 

Proof.  Suppose  T  =  (spawn,  n)  ::T'.  We  know  one  of  the  following  holds: 

(i)  ( [let  7T  in  Cl  II ...  II  C„J ,  (ctc,  CTo,  ©))  *  abort;  or 

(ii)  ([let  n  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  ^^*(skip,  _). 

For  either  case,  we  can  prove  lock-free(T)  by  the  operational  semantics.  □ 

We  define  the  MGC  version  of  lock-freedom. 

Definition  7.  lock-free^^''(i7),  iff 

Vn,  CTo,  T.  T  £  7L[(let  7T  in  MGC„),  (0,  CTo,  ©)]  A  (cto  e  dom{(p)) 

(3i.  is_obj_abt(T(i)))  V  (VA  3j.  j  >  i  A  is_ret(r(j))) 

We  use  get_objevt(T)  to  project  T  to  the  sub-trace  of  object  events  (including 
method  invocation,  return,  object  fault,  and  normal  object  actions).  Thus  we 
know: 

VT,  T'.  (get_objevt(r)  =  get_objevt(T'))  =>  (get_hist(T)  =  get_hist(T')) . 

The  following  lemma  is  similar  to  Lemma  1  (MGC  is  the  most  general) .  But 
here  we  take  into  account  infinite  traces  generated  by  complete  executions. 

Lemma  9.  For  any  T,  if 

T  e  7I;|(let  7T  in  Cl  II ...  II  C„),  (ctc,  CTo,  ®)1, 

then  one  of  the  following  holds: 

(1)  \T\  ^  w;  or 

(2)  there  exists  i  such  that  Vj  >  i.  is_clt(T(j));  or 

(3)  there  exists  Tm  such  that 

Tm  €  7I;|(let  n  in  MGC„),  (0,  cro,@)| , 
and  get_objevt(T)  =  get_objevt(rm). 

Proof.  By  co-induction  over  T  €  7Ij|bF,  <Sj|,  where 

( [let  n  in  Cl  II ...  II  C„J ,  (_,  _,  ©))  *  {W,S)  A  {W  yf  skip) . 
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In  other  words,  (11^5)  is  a  “well-formed”  configuration.  We  only  need  to  prove 
the  following  (B.7): 

for  any  T,  W,  S,  Wm  and  Sm-,  if 

(a)  (Vh,5)  ;iMGC 

(b)  and 

(c)  Vi.  3j.  j  >iA  -'is_clt(T(j))  A  T(j)  ^  (_,  term), 

then  there  exists  such  that  ■  and  get_objevt(T)  = 

get_objevt(rm)- 

(B.7) 

Here  ;:jMGC  is  defined  in  Figure  11(a).  We  first  prove  ;^mgc  is  a  simulation: 

If  (VF,5)  ;iMGC  and  (W,5)  ^  (IF', 5'),  then 

there  exist  T,W^,S'^  such  that 
get_objevt(e)  =  get_objevt(r)  and 
(fF',5');iMGC 

This  is  proved  by  case  analysis  of  e. 

~  If  e  =  (t,out,n)  or  e  =  (t,  clt)  or  e  =  (t,term),  we  know  the  call  stack 
of  the  current  thread  t  (which  makes  the  step)  is  o,  before  and  after  the 
step.  Then  we  simply  let  {Wm,Sm)  go  zero  step,  and  hence  T  =  e.  Thus 
get_objevt(e)  =  get_objevt(T)  and  we  can  prove  (IF', 5')  ;^mgc  {WrmSm)- 

—  If  e  =  we  know  the  call  stack  of  the  thread  t  is  o  before  the 

step  and  is  {ai,x,C')  after  the  step.  Then  we  know  the  code  of  t  in  IFn 
must  be  MGT.  We  let  it  go  two  steps.  After  the  first  step,  the  code  of  t 
becomes  /rand(m)(rand());  MGT.  We  evaluate  rand(TO)  to  i  and  rand() 
to  n,  and  make  the  second  step.  Thus  the  resulting  configuration  satisfies 
{W',S')  <mgc  and  T  =  e. 

—  If  e  =  (t,  ret,n),  we  know  the  call  stack  of  the  thread  t  is  {ai,x,C')  before 
the  step  and  is  o  after  the  step.  Then  we  let  the  code  of  t  in  fF^  go  two  steps. 
After  the  first  step,  the  code  of  t  becomes  skip;  MGT.  After  the  second  step, 
we  have  (IF',  S')  ;:jMGC  S'^).  Also  we  know  the  first  step  generates  the 
event  e,  and  thus  get_objevt(e)  =  get_objevt(r). 

—  If  e  =  (t,  obj),  we  know  the  call  stack  of  the  thread  t  is  not  o  before  or  after 
the  step.  We  let  the  code  of  t  in  IFm  go  one  step,  and  hence  T  =  (t,  obj) 
and  (IF',  5')  ;iMGC  iW;^,S^). 

Thus  we  have  proved  (B.8). 

From  (B.8),  we  can  prove  the  following  by  induction  over  the  steps  of  T: 

If  (IF,  5)  ;^mgc  (IF,„,5^)  ,  (IF,  5)  (IF',  5')  and 

(3A  -'is_clt(T(i))  A  T(f)  7^  (_,  term)),  then 

there  exist  Tm,  W^,S'^  such  that  (IF™,  Sm)  > 

get_objevt(T)  =  get_objevt(rm)  and  (IF',  5')  ;^mgc  {Wm,S'm)  ■ 
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Then  we  can  get  (B.7)  by  co-induction. 

When  (Vh,  5)  =  ([let  77  in  Ci  || . . .  ||  CnJ ,  (ctc,  CTo,  ©)),  we  know  (W,  5)  ;^mgc 
([let  77  in  MGC„J ,  (0,  cto,  ©)).  Thus  we  are  done.  □ 

We  prove  that  the  MGC  version  is  equivalent  to  the  original  version  of  lock- 
freedom. 

Lemma  10.  lock-free,^ (77)  lock-free|^'^*'(77)  . 

Proof.  1.  lock-free<^(77)  =4^  lock-free^^^(77) : 

We  prove  the  following: 

yn,ao,T.  T  G7L|I(let  77  in  MGC„),  (0,  cto,  ®)|  A  (cto  G  (7om(<p))  A  lock-free(r) 
=>  (37  is_obj_abt(r(i)))  V  (V7  3j.  j  >  i  A  is_ret(T(j))) 

(B.9) 

We  unfold  7Ij|(let  77  in  MGC„),  (0,  Co,  ©)],  then  we  have  three  cases: 

(1)  ([let  77  in  MGG„J ,  (0,  (To,  ©))  ^"^  . 

(2)  ([let  77  in  MGG„J ,  (0,  (To,  ©))  (skip,  _) 

(3)  ([let  77  in  MGG„J ,  (0,  (To,  ©))  abort 

We  know  from  the  operational  semantics  that  (2)  is  impossible. 

For  (3),  we  know  from  the  operational  semantics  that  last(T)  =  (_,  obj,  abort). 
Thus  3i.  is_obj_abt(T(7)). 

For  (1),  we  prove  the  following  by  contradiction: 

Vn,  ao,T.  ([let  77  in  MGC„J,  (0,  do,  ®))  _  /g 

Vi.  3j.  j  >i/\  (isJnv(r(ji))  V  is_ret(r(ji))  V  T{j)  =  (-,obj)) 

Then,  Vi.3j.  j  >  iA(is_ret(T(j))Vpend_inv(T(l..j))  ^  0).  Thus  by  lock-free(T), 
we  are  done. 

2.  lock-free^^^(77)  lock-free^  (77) : 

For  any  T  G  7L[[(let  77  in  Ci  || . . .  ||  Cn),  (o'c,  (To,  ©)1,  by  Lemma  9,  we  know 
one  of  the  following  holds: 

(1)  \T\  w;  or 

(2)  there  exists  i  such  that  Vj  >  i.  is_clt(T(j));  or 

(3)  there  exists  such  that 

Tm.  G  7I;|(let  77  in  MGG„),  (0,  (To,®)] , 
and  get_objevt(r)  =  get_objevt(Tm). 

For  (1),  by  Lemma  8,  we  know  lock-free(T). 

For  (2),  we  know  lock-free(T)  holds  immediately  by  definition. 

For  (3),  from  lock-free^^^(77),  we  know 

(3i.  is_obj_abt(r„(i)))  V  (Vi.  3j.  j  >  i  A  is_ret(Tm(j))). 

Thus  we  have: 

(3i.  is_obj_abt(r(i)))  V  (Vi.  3j.  j  >  i  A  is_ret(r(j))). 

If  3i.  is_obj_abt(T(i)),  we  know  lock-free(T).  Otherwise,  we  know 


280 


Vi.  3j.  j  >i  A  is_ret(T(j)). 


Thus,  for  any  i,  if  pend_inv(r(l..i))  ^  0,  then  there  exists  j  >  i  such  that 
is_ret(T(j)).  Therefore  lock-free(T)  and  we  are  done. 

□ 

Then,  we  only  need  to  prove  the  following  (B.ll),  (B.12)  and  (B.13): 

n  Ua  ^  n  Ha  (b.ii) 

HQ^Ha  =»  lock-free^''^(7T)  (B.12) 

n  Qip  Ua  a  lock-free,^ (7T)  7T  11  a  (B.13) 

Proofs  of  (B.ll)  For  any  n,  Ci,  . . . ,  Cn,  <Jc,  and  0^  such  that  (/?(cro)  =  CTo, 

for  any  T  if 

T  G  C>|(let  n  in  Cl  II ...  II  C„),  (ctc,  CTo,  ®)1, 

we  know  there  exists  Ti  such  that  T  =  get_obsv(Ti)  and 

Ti  G  T|(let  iT  in  Cl  II ...  II  C„),  (ctc,  CTo,  ®)1 . 

Thus  there  exists  T{  and  T"  such  that  T"  =  Ti  ::  T{  and  one  of  the  following 
holds: 

(i)  ([let  iT  in  Cl  II . . .  ||  C„J ,  (ctc,  CTo,  ©))  •;  or 

(ii)  ([let  iT  in  Cl  II ...  II  C„J ,  (ctc,  Co,  ©))  (skip,.);  or 

rp!  / 

(iii)  ([let  iT  in  Cl  II ...  II  C„J ,  (ctc,  CTo,  ©))  1-^*  abort. 

That  is, 

T['  G  7I;|(let  iT  in  Cl  II ...  II  C„),  (ctc,  o-q,  ®)1  ■ 

Since  iT  Ua,  we  know  there  exists  T2  such  that 

T2  G  7LI(let  IIa  in  Cl  II . . .  ||  C„),  (ctc,  Ca,  ©)1 , 

and 

get_obsv(T20  —  get_obsv(T")  =  T ::get_obsv(T{) . 

Thus  there  exists  T2  such  that 

T2  G  TI(let  Ua  in  Cl  II . . .  ||  C„),  (ctc,  CTo,  ©)1 , 
and  get_obsv(r2)  =  T.  Thus 

T  G  C|(let  Ha  in  Ci  || . . .  ||  C„),  Ua,  ©)1 , 

and  we  are  done. 
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Proofs  of  (B.12)  We  construct  another  most  general  client  as  follows: 


MGTpl  =  while  (true){  /rand(m)(i'and());  print(l);  } 

MGCpl„  MGTpl 

The  following  lemma  describes  the  relationship  between  MGCpl  and  MGC: 
Lemma  11.  (1)  For  any  T,  if 

T  G  7LI(let  n  in  MGC„),  (0,  CTo,  ©)], 
then  there  exists  Tp  such  that 

Tp  n  in  MGCpl„),  (0,  CTo,  @)1, 

Tp\{_,  out,  1)  =  T  and 

Vt,  t.  Tp{i)  =  (t,  ret,  _)  Tp{i  +  1)  =  (t,  out,  1) . 

(2)  For  any  Tp,  if 

TpGTLKlet  n  in  MGCpl„),  (0, CTq,  @)1, 

then  there  exists  T  such  that 

T€%l{let  n  in  MGC„),  (0,  ao,  @)1 
and  rp\(_,  out,  1)  =  T. 

Here  we  use  rp\(_,  out,l)  to  mean  a  sub-trace  of  Tp  which  removes  all  the 
events  of  the  form  (_,  out,  1). 

Proof.  By  constructing  simulations  between  executions  of  let  H  in  MGC„  and 
let  n  in  MGCpl„.  □ 

Lemma  12.  Suppose  Ha  is  total. 

For  any  n,  Ua  and  T,  ifT  G  0^|(let  Ha  in  MGCpl„),  (0,  Ua)®)!)  then  T  is  an 
infinite  trace  of  (_,  out,  1). 

Proof.  We  need  to  prove:  for  any  T  such  that 
T  G  0(^|(let  Ha  in  MGCpl„),  (0,  era,®)!,  the  following  hold: 

(1)  \T\=cw, 

(2)  for  any  i,  T(i)  =  (_,  out,  1). 

For  (1):  we  can  prove  for  any  T'  such  that 

T'GT^Klet  Ha  in  MGCpl„),  (0,  ®)1, 

we  have  |T'|  =  oj.  If  |T|  w,  we  know  there  exists  i  such  that 

Vj  >  i.  isJnv(r'(j))  V  is_ret(T'(j))  V  T'(j)  =  (-,obj)  V  T'(j)  =  (-,clt). 
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Since  Ua  is  total,  from  the  code  and  the  operational  semantics,  we  know  this  is 
impossible. 

(2)  is  easily  proved  from  \T\  =  oj  and  that  the  code  can  only  produce 
(_,  out,  1)  as  observable  events.  □ 

To  prove  lock-free|^‘^^(i7),  we  want  to  show:  for  any  n,  CTq,  aa  and  T,  if 
T  G  7I;|(let  n  in  MGC„),  (0,cro,@)l  and  tp{ao)  =  <Ja,  then 

(3b  is_obj_abt(T(i)))  V  (Vi.  3j.  j  >  i  A  is_ret(T(j)))  (B.14) 

First,  if  T  G  7L|(let  U  in  MGC„),  (0,CTo,@)|,  by  Lemma  11(1),  there  exists 
Tp  such  that  Tp  G  7LI(let  7T  in  MGCpl„),  (0,  cto,®)!  and  Tp\(_,  out,  1)  =  T. 
Since  77  11  a,  we  know 

O^Klet  77  in  MGGpl„),  (0,  CTo,  ©)]  C  O^Klet  11  a  in  MGGpl„),  (0,  CTa,  ®)]| . 

From  Lemma  12,  we  know  for  any  T,  if  T  G  (!li.,;|(let  77  in  MGCpl„),  (0,  CTo,  ®)]|, 
then  T  is  an  infinite  trace  of  (_,  out,  1). 

Then  we  know:  get_obsv(Tp)  is  an  infinite  trace  of  (_,  out,  1). 

Thus  |Tp|  =  u  and 


Vi.  3j.  j  >  i  A  Tp{j)  =  (_,  out,  1) .  (B.15) 

We  prove  the  following: 

Vi.  3j.  j  >  i  A  is_ret(rp(j)) .  (B.16) 

This  is  proved  as  follows.  From  [Tpl  =  uj  and  (B.15),  we  know  for  any  i,  there 
exist  ji, . . . ,  j„+i  such  that  i  <  ji  <  •  •  •  <  jn+i  and  Vfc  G  [l..n  +  1].  Tp{jk)  = 
(_,  out,  1).  Then,  by  the  pigeonhole  principle,  we  know  there  exists  a  thread  t 
producing  two  (t,out,  l)-s.  Suppose  jk  and  ji  are  the  indexes  of  the  two  events 
produced  by  t  and  jk  <  ji-  By  the  operational  semantics,  we  know  there  exists 
j'  such  that  i  <  jk  <  j'  <  ji  and  is_ret(Tp(/)).  Thus  we  have  proved  (B.16). 
Since  Tp\(_,  out,  1)  =  T,  from  (B.16),  we  know  (B.14)  holds  and  we  are  done. 


Proofs  of  (B.13)  We  need  to  prove  that  if  77  \—^p  Ha  and  lock-freep(77),  then 
for  any  n,  Ci,  . . . ,  Cn,  cFc,  cFq  and  Ua  such  that  ^{<Jo)  =  <^a,  we  have 

(!l<^|(let  77  in  Cl  II . . .  ||C„),  (ctcCTo,®)]! 

C  C^^Klet  IIa  in  Ci  || . . .  ||  C„),  (ctc,  ^a,  ®)]| . 

Thus  we  only  need  to  prove:  for  any  T, 

(1)  If  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  hA*  abort, 
then  there  exists  such  that 

T 

([let  Ua  in  Ci  || . . .  ||  C„J,  (ctc,  aa,  ©))  i— abort  and 
get_obsv(r)  =  get_obsv(Ta). 
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(2)  If  ([let  n  in  Cl  II . . .  ||  C„J ,  (ctc,  CTo,  ©))  (skip,  _), 
then  there  exists  Ta  such  that 

( [let  IIa  in  Cl  II . . .  ||  C„J ,  (ctc,  CTo,  ©))  ^  *  (skip,  _)  and 
get_obsv(r)  =  get_obsv(Ta). 

(3)  If  ([let  n  in  Ci  || . . .  ||  C„J ,  (ctc,  do,  ©)) 
then  there  exists  Ta  such  that 

([let  Ua  in  Ci  || . . .  ||  C„J ,  (ctc,  da,  ©))  •  and 

get_obsv(r)  =  get_obsv(Ta). 

Actually  neither  (1)  or  (2)  depends  on  progress  properties.  We  can  prove  the 
following  lemma. 

Lemma  13.  If  11  IIa,  then  for  any  n,  Ci,  . . . ,  C„,  dc,  do,  da  and  T  such 
that  ip{(Jo)  =  <Ja,  we  have 

1.  //  ([let  n  in  Cl  II ...  II  C„J,  (dc,  do,  ©))  abort, 
then  there  exists  Ta  such  that 

([let  IIa  in  Ci  || . . .  ||  C„J,  (dc,  do,  ©))  abort  and 
get_obsv(r)  =  get_obsv(Ta). 

2.  //([let  n  in  Ci  || . . .  ||  C„J ,  (dc,  do,  ©))  i^*(skip,  _), 
then  there  exists  Ta  such  that 

([let  IIa  in  Ci  || . . .  ||  C„J ,  (dc,  da,  ©))  (skip,  _)  and 
get_obsv(r)  =  get_obsv(Ta). 

Proof.  1.  We  know  is_abt(last(r)).  By  11  IIa,  we  know  there  exists  To  such 
that 

Ta  e  T|(let  IIa  in  Ci  || . . .  ||  C„),  (dc,  do,  ©)1 

and  get_obsv(r)  =  get_obsv(Ta).  Thus  is_abt(last(ra)),  and  by  the  opera¬ 
tional  semantics,  we  know 

([let  IIa  in  Ci  || . . .  ||  C„J ,  (dc,  da,  ©))  abort, 

and  we  are  done. 

2.  (a)  If  n  =  1,  we  know 

(let  n  in  {C;end},  (dc,do,®))  1-^*  (skip,  _). 

Thus  there  exists  T"  such  that  T  =  T" ::  (1,  term).  Let 

T'  =  T" ::  (1,  clt) ::  (1,  out,  “done”) ::  (1,  clt) ::  (1,  term), 
where  we  assume  (l,out,  “done”)  is  different  from  all  the  events  in  T, 
then 

(let  n  in  {C;  print(“done”);  end},  (dc,  do,  ©))  (skip,  _). 
Since  11  IIa,  we  know  there  exists  T'a  such  that 

T^  €  T|(let  IIa  in  {C;  print  (“done”);  end}),  (dc,  da,  ©)] 
and  get_obsv(r')  =  get_obsv(r^).  Thus  we  know  there  exists  T"  such 
that 

T^  =  /"""(I,  out,  “done”) ::  (1,  clt) ::  (1,  term). 
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and  by  the  operational  semantics,  we  know  there  exists  Ta  such  that 
T"  =  To::(l,clt)  and 

(let  Ua  in  {C;  end},  (ctc,  CTo,  ©))  *  (skip,  _). 

Also  we  have  get_obsv(r)  =  get_obsv(Ta). 

(b)  If  n  >  1,  we  construct  another  program  let  U  in  C}  || . . .  ||  C'n  as  follows: 
we  pick  n  —  1  fresh  variables:  d2, . . . ,  d„, 

C'l  =  (Ci;  if  (d2&&  . . .  &&d„)  print  (“done”); ) 

C'  =  (Ci]  di  :=  true)  Vt  €  [2..n] 

and  also  let 

ct'  =  (Tc  W  {d2  false, . . . ,  d„  false}  . 


Then,  if 


([let  n  in  C'l  II ...  II  CnJ,  (ctcCTo,®))  (skip,  _), 
let  T"  be  the  result  after  removing  all  the  termination  markers  in  T,  and 
T'  =  T" ::  (2,  clt) ::  (2,  clt) (n,  clt) ::  (n,  clt) 

::  (1,  clt) ::  (1,  clt) ::  (1,  ont,  “done”) 

::  (1,  clt) ::  (1,  term) (n,  clt) ::  (n,  term) 
where  we  still  assume  (1,  out,  “done”)  is  different  from  all  the  events  in 
T,  we  can  prove: 

([let  77  in  C[  II . . .  ||  ,  (ct' ,  CTo,  ©))  (skip,.). 

Since  77  IIa,  we  know  there  exists  such  that 

T'  €  r[(let  Ha  in  C[  || . . .  ||  C'J,  (a[,  a,,  ®)1 
and  get_obsv(T')  =  get_obsv(r^).  Thus  we  know  there  exists  i  such  that 
T^{i)  =  (1,  out,  “done”).  Then  we  know 

([let  Ha  in  C[  II . . .  ||  C^J,  (ct',  era,  ©))  (skip,.). 

We  can  remove  all  the  actions  of  the  newly  added  commands,  construct 
a  simulation  between  the  two  executions,  and  prove:  there  exists  such 
that 

([let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc,  (Ta,  ©))  (skip,.), 

and  get.obsv(Ta)  =  get.obsv(Ta)  =  get.obsv(r). 

Thus  we  are  done.  □ 


For  (3),  we  define  the  simulation  relation  ^  in  Figure  11(d),  and  prove  the 
following  (B.17)  by  case  analysis  and  the  operational  semantics: 

For  any  IFi,  5i,  W2,  <S2,  W3,  S3  and  ei, 
if  i.Wi,Si)  ©  (VF2, 52;  11^3,53)  and  (VFi,5i)  ^ 
then  there  exist  T2,  IF2,  T3,  IF3  and  S'^  such  that 

(W2,52)  (W',5'),  {W3,S3)  (IF',5'), 

r3\(.,obj)  =  ei\(.,obj)  and  (1F[,5[)  (W2', 5^;  IF^,  5^). 

(B.17) 

With  (B.17),  we  can  prove  the  following  (B.18)  by  induction  over  the  length  of 
Ti: 
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For  any  Wi,  Si,  W2,  S2,  W3,  S3  and  Ti, 

if  (fFi,5i)  {,W2,S2;W3,S3),  (fFi,5i)  ^+{W{,S[)  and 
last(Ti)  ^  (-,obj), 

then  there  exist  T2,  W2,  S2,  T3,  W3  and  S3  such  that 

{W2,S2)  A*  m,S!2),  (W^3,53) 

Ti\(_,obj)  =  r3\(_,obj)  and  {Wi,S[)  ;<  (tF',5';  VF',5'). 

(B.18) 

With  (B.18),  we  can  prove  the  following  (B.19): 

For  any  W,  S,  Wi,  Si,  W2,  S2,  W3,  S3,  Tq  and  Ti, 

if  (IF, 5)  is  well-formed  and  out  of  method  calls,  (lF,iS)  1-^*  (lFi,5i), 

iWi,Si)  :<  {W2,S2;W3,S3),  iWi,Si)  and  lock-free(To  ::Ti), 

then  there  exists  T3  such  that  (^3,53)  •  and 

Ti\(-,obj)  =  r3\(_,obj). 

(B.19) 

We  prove  (B.19)  as  follows.  Let  T  =  Tq::Ti.  Since  lock-free(r),  we  know  one  of 
the  following  holds: 

(i)  there  exists  i  such  that  Vj  >  i.  is_clt(T(j));  or 

(ii)  for  any  i,  if  pend_inv(r(l..i))  ^  0,  then  there  exists  j  >  i  such  that  is_ret(r(j)). 
For  (i),  we  know  there  exist  1F{,  T{  and  T"  such  that 

iWi,Si)  ^+{Wi,S[),  {Wi,S'i)^‘^-, 

Ti  =  T{  ■■Ti' ,  Ti  =  Tiil.A) ,  is_clt(last(T()) ,  Vj.  is_clt(T"(j)) . 

By  (B.18),  we  know:  there  exist  T2,  IF2,  S^,  Tg,  IF3  and  S3  such  that 

iW2,S2)  ^*iWi,Si),  {W3,S3)  ^+m,S'3),  T{\(.,obj)  =  n\{-,ohj)  and 
(1F(,5()  (IF2, 52;  IF3, 53).  Then  by  coinduction  over  Ti  and  from  (B.18),  we 

get:  there  exists  Tg'  such  that 

iW',S'3)  •  and  T"\(_,obj)  =  T"\(_,obj). 

Let  Tg  =  Tg  ■■:T^',  and  we  know 

(1^3,53)  and  Ti\(_,obj)  =r3\(_,obj). 

Suppose  (i)  does  not  hold.  Thus  we  know 

Vi.  3j.  j  >  i  A  is_obj(T(j)) . 

By  the  operational  semantics,  we  know 

Vi.  3j.  j  >  i  A  pend_inv(T(l..j))  ^  0 . 

Since  (ii)  holds,  we  know 

Vi.  3j.  j  >  i  A  is_ret(T(j)) . 
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Then  by  coinduction  and  from  (B.18),  we  know  there  exists  T3  such  that 

(1^3,53)  •  and  Ti\(_,obj)  =  T3\(_,obj). 

Thus  we  have  proved  (B.19).  On  the  other  hand,  for  any  n,  Ci,  . . . ,  C„,  CTc,  (Tq 
and  CTa,  by  Lemma  1,  we  know 

"HKlet  n  in  Cl  II ...  II  C„),  (ctc,  ©)]  C  7^|(let  11  in  MGC„),  (0,  CTo,  ©)] . 
From  n  \—^p  IIa,  by  Lemma  3,  we  know  U  Thus,  if  (/?(cro)  =  CTa,  then 

"HKlet  n  in  MGC„),  (0,  CTo,  ©)]  C  "HKlet  11  a  in  MGG„),  (0,  CTa,  ©)]| . 

Then  we  know 

(let  77  in  Cl  II ...  II  C„,  (ctc,  CTo,  ©)) 

(let  IIa  in  MGG„,  (0,  CTa,  ©); 
let  IIa  in  Ci  || . . .  ||  C„,  (ctc,  CTa,  ©)), 

Thus,  if  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTq,  ©))  ' — •,  by  lock-free,^(77),  we  know 
lock-free(T).  Then  from  (B.19)  we  get:  there  exists  Ta  such  that 

([let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  • 

and  T\(_,  obj)  =  Ta\{-,  obj).  Thus  get_obsv(T)  =  get_obsv(Ta)  and  we  are  done. 

B.4  Proofs  of  Theorem  ?? 

Similar  to  Lemma  8,  we  can  prove  the  following  lemma. 

Lemma  14  (Finite  trace  must  be  wait-free).  For  any  T,  if 

T  77  in  Ci  || . . .  ||  C„),  (ctc,  CTo,  ©)] 

and  |T|  yf  w,  then  wait-free(T)  must  hold. 

We  define  the  MGC  version  of  wait-freedom,  and  prove  it  is  equivalent  to  the 
original  version. 

Definition  8.  wait-free|^'^*'(7T),  iff 

Vn,  CTo,  T.  T  £  7L[(let  77  in  MGG„),  (0,  CTq,  ©)]  A  (ctq  e  dom{(p)) 

=^>  wait-free(T) 

Lemma  15.  wait-free,^(77)  wait-free|^'^‘"(77)  . 

Proof.  1.  wait-free,^(77)  =>  wait-free^^''(77) : 

Trivial. 

2.  wait-free|^'^*"(77)  =4'  wait-free,^(77) : 

For  any  T  G  7LI(let  77  in  Ci  || . . .  ||  C„),  (ctc,  CTq,  ®)|,  by  Lemma  9,  we  know 
one  of  the  following  holds: 


287 


(1)  |r|  ^  Oj;  or 

(2)  there  exists  i  such  that  Vj  >  i.  is_clt(r(j));  or 

(3)  there  exists  Tm  such  that 

Tm  €  7I;|(let  n  in  MGC„),  (0,  CTo,®)]  , 
and  get_objevt(r)  =  get_objevt(Tm). 

For  (1),  by  Lemma  14,  we  know  wait-free(T)  holds. 

For  (2),  we  know  |T|  =  w. 

For  any  k  and  e,  if  e  S  pend_inv(r(l..fc)),  we  know  one  of  the  following  must 
hold: 

(i)  3j.  j  >  kA  match(e,T(j)). 

(ii)  Vj.  j  >  k  =A  -imatch(e,  T(j)).  Thus  we  can  prove: 

Vj  >  k.  e  €  pend_inv(r(l..j)). 

Let  I  =  max(i,  k).  Then  we  know: 

Vj  >  1.  is_clt(T(j))  Ae  G  pend_inv(T(l..j)). 

Thus  by  the  operational  semantics,  we  can  prove: 

Vj  >  1.  tid(T(j))  ^  tid(e). 

Thus  we  know  wait-free(T). 

For  (3),  suppose  (1)  does  not  hold  for  T,  and  we  only  need  to  prove  the 
following: 

for  any  i  and  e,  if  e  G  pend_inv(r(l..i)),  then  there  exists  j  >  i  such 
that  either  V/c  >  j.  tid(T(fc))  ^  tid(e)  or  match(e,  T(j)). 

From  get_objevt(T)  =  get_objevt(Tm),  we  know 

-idt.  is_obj_abt(Tm(i)). 

Then  by  the  operational  semantics  and  the  generation  of  T^,  we  know 

-n3i.  is_abt(T„(i)). 

From  wait-free^^^(i7),  we  know  wait-free(rm),  then  we  have 

for  any  i  and  e,  if  e  G  pend_inv(Tm(l..i)),  then  there  exists  j  >  i  such 
that  either  Vfc  >  j.  tid(Tm(fc))  ^  tid(e)  or  match(e,  Tm(j)). 

For  any  i  and  e,  if  e  G  pend_inv(r(l..i)),  since  get_objevt(T)  =  get_objevt(rm), 
we  know  there  exists  im  such  that 

e  G  pendJnv(Tm(l..im))  and  get_objevt(T(l..i))  =  get_objevt(Tm(l..im)). 

We  know  there  exists  jm  >  im  such  that  one  of  the  following  holds: 

(i)  match{e,  Tm{jm))]  or 

(ii)  Vfc  >  jm-  t\d{Tmik))  ^  tid(e). 

For  (i),  since  get_objevt(T)  =  get_objevt(Tm),  we  know  there  exists  j  >  i 
such  that  match(e,  r(j)). 

For  (ii),  suppose 

Vj  >  i.  -'match(e,  T(j))  and  Vj  >  i.3k>  j.  tid(T(A:))  =  tid(e) . 

Since  e  G  pend_inv(T(l..i)),  by  the  operational  semantics,  we  know 
Vj  >  i.  3k  >  j.  T{k)  =  (tid(e),  obj) . 
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Since  get_objevt(r)  =  get_objevt(Tm),  we  know 


Vj  >  i^.  >  j.  T^{k)  =  (tid(e),obj) , 

which  contradicts  (ii).  Thus  we  get  wait-free(T)  and  we  are  done. 

□ 

Then,  we  only  need  to  prove  the  following  (B.20),  (B.21)  and  (B.22): 

n  Ha  ^  no;,  Ha  (b.20) 

n  Ejr  nA  wa\t-hee^^^{n)  (B.21) 

n  Qip  Ua  a  wait-freec^(i7)  =>  II  IIa  (B.22) 

Proofs  of  (B.20)  For  any  n,  Ci,  . . . ,  Cn,  (Tc,  Oq  and  Ca  such  that  (/?(cro)  =  Ua, 
for  any  T,  suppose 

T  e  %l{\et  n  in  Cl  II ...  II  C„),  ((TcCTo,©)!  . 

Since  II  IIa,  we  know  there  exists  Ta  such  that 

Ta  G  7I;|(let  IIa  in  Ci  || . . .  ||  C„),  (ucCTa,©)! , 
get_obsv(Ta)  =  get_obsv(T)  and  div_tids(ro)  =  div_tids(T) . 

Thus  we  are  done. 


Proofs  of  (B.21)  Just  like  the  proofs  of  (B.12),  we  use  the  most  general  client 
MGCpl.  We  hrst  prove  the  following  lemma: 

Lemma  16.  Suppose  IIa  is  total. 

For  any  n,  aa,  T  and  S,  if(T,S)  G  Ott^|(let  IIa  in  MGCpl„),  (0,  CTo,  ©)|,  then 
div_tids(r)  =  S. 

Proof.  We  know  there  exists  Ti  such  that 

Ti  G7L[(let  IIa  in  MGCpl„),  (0, Ua, ©)1, 

T  =  get_obsv(Ti)  and  S  =  div_tids(ri). 

It’s  easy  to  see  that  div_tids(r)  C  S. 

On  the  other  hand,  for  all  t  G  S',  we  know: 

Mi.  3j.  j  >  z  Atid(Ti(j))  =  t. 

By  the  operational  semantics  and  the  generation  of  Ti ,  we  know 
Vz.  3j.  j  >  z  A  ri(j)  =  (t,  out,  1) . 

Thus  we  can  prove: 

Vz.  3j.  j  >  z  Atid(r(j))  =  t. 
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Thus  t  G  div_tids(T),  and  we  are  done. 


□ 


For  any  n,  CTq,  <Ja  and  T  such  that  ip{cro)  =  era,  if 

T  G7I;|(let  n  in  MGC„),  (0,  (Tq,  ©)], 
by  Lemma  11(1),  there  exists  Tp  such  that 

TpGTLKlet  n  in  MGCpl„),  (0,  (Tq,  @)]  and  Tp\(_,  out,  1)  =  T. 
Suppose  -<3i.  is_abt(T(i)). 

Then  for  any  i  and  e,  if  e  G  pendJnv(r(l..i)),  we  know  there  exists  ip  such  that 
e  G  pend_inv(Tp(l..ip))  and  (rp(l..ip))\(_, out,  1)  =  r(l..f). 

Let  t  =  tid(e),  we  suppose 

Vj  >  i.  3k  >  j.  tid(r(A:))  =  tid(e)  =  t . 

Since  Tp\(_,  out,  1)  =  T,  we  know: 

Vj  >  ip.  3k  >  j.  tid(Tp(fc))  =  t. 

Thus  we  know 


t  G  div_tids(Tp) . 

On  the  other  hand,  since  7T  11  a,  we  know: 

C>t<^|(let  n  in  MGCpl„),  (0,0-0,  @)|  C  e)t<^|(let  Ua  in  MGCpl„),  (0,  CTo,  @)1 . 
Then  from  Lemma  16,  we  know 

div_tids(rp)  =  div_tids(get_obsv(rp)). 


Thus 


t  G  div_tids(get_obsv(Tp)), 

and  then  we  can  prove: 

Vj.  3k  >  j.  Tp{k)  =  (t,  out,  1) . 

Then  since  e  G  pend_inv(Tp(l..ip))  and  by  the  operational  semantics,  we  know 
there  must  exist  j  such  that  j  >  ip  and  match(e,  rp(j)). 

Since  Tp\{_,  out,  1)  =  T,  we  know: 

there  exists  j  such  that  j  >  i  and  match(e,  T(j)). 

Thus  wait-free(T)  and  we  are  done. 
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Proofs  of  (B.22)  We  need  to  prove  that  if  77  IIa  and  wait-free,^(77),  then 
for  any  n,  Ci,  . . . ,  Cn,  CTc,  CTo  and  Ca  such  that  ipico)  =  era,  we  have 

C>t^|(let  77  in  Cl  II . . .  ||C„),  (crc,cro,@)| 

C  Ct^Klet  IIa  in  Ci  || . . .  ||  C„),  (acCTa,©)! . 

Thus  we  only  need  to  prove:  for  any  T, 

(1)  If  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  abort, 
then  there  exists  Ta  such  that 

T 

([let  IIa  in  Ci  || . . .  ||  C„J,  (ctc,  (Ta,  ©))  i— abort  and 
get_obsv(r)  =  get_obsv(Ta). 

(2)  If  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc, CTq, ©))  HA*(skip,_), 
then  there  exists  Ta  such  that 

( [let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc,  (Ta,  ©))  *  (skip,  _)  and 

get_obsv(r)  =  get_obsv(Ta). 

(3)  If  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©)) 
then  there  exists  Ta  such  that 

([let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc, da,  ©))  •, 

get_obsv(r)  =  get_obsv(Ta)  and  div_tids(T)  =  div_tids(Ta). 

(1)  and  (2)  are  proved  in  Lemma  13. 

For  (3),  we  define  the  simulation  relation  ©  in  Figure  11(d),  and  as  in  the 
proof  for  (B.13),  we  can  get  the  following  (B.23)  from  (B.19)  and  the  fact  that 
wait-free(ro  ::Ti)  implies  lock-free(To  ::Ti): 

For  any  W,  S,  Wi,  Si,  W2,  S2,  W3,  S3,  Tq  and  Ti, 

if  (1F,4S)  is  well-formed  and  out  of  method  calls,  (IF, 5)  1-^*  (lFi,5i), 

(VFi,5i)  ©  (1F2,52;1F3,53),  (1Fi,5i)  •  and  wait-free(To ::Fi), 

then  there  exists  T3  such  that  (1F3,53)  •  and 

Ti\(_,obj)  =  r3\(_,obj). 

(B.23) 

On  the  other  hand,  for  any  n,  Ci,  . . . ,  C„,  CTc,  do  and  da,  by  Lemma  1,  we  know 
"HKlet  77  in  Cl  II ...  II  C„),  (do,  do,  ©)]  C  77|(let  77  in  MGC„),  (0,  do,  ©)1 . 
From  77  \—ip  IIa,  by  Lemma  3,  we  know  77  ^^pIlA-  Thus,  if  (p(ao)  =  Ta,  then 
"HKlet  77  in  MGC„),  (0,  do,  ©)1  C  "HKlet  IIa  in  MGC„),  (0,  da,  ©)] . 

Then  we  know 

(let  77  in  Cl  II ...  II  C„,  (do,  do,  ©)) 

(let  IIa  in  MGC„,  (0,  do,  ®); 
let  IIa  in  Ci  || . . .  || C„,  (do, Ta,®)), 

Thus,  if  ([let  IT  in  Ci  || . . .  ||  C„J,  (do,  do,  ©))  ' — ■,  by  wait-free,^(77),  we  know 
wait-free(r).  Then  from  (B.23)  we  get:  there  exists  Ta  such  that 
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([let  Ua  in  Ci  || . . .  ||  CnJ ,  (ctc,  CTa,  ©))  andT\(_,obj)  =  ra\(_,  obj). 

Thus  we  know  get_obsv(r)  =  get_obsv(To). 

Below  we  prove:  div_tids(T)  =  div_tids(Ta). 

(a)  div_tids(r)  C  div_tids(Ta): 

For  any  i,  since  T\(_,  obj)  =  ra\(-,  obj),  we  know  there  exists  i'  such  that 
T(l..i')\(_,  obj)  =  Ta(l..t)\(_,  obj).  For  any  t  G  div_tids(T),  we  know 
3/./>t'Atid(T(/))=t. 

If  T{j')  ^  (t,  obj),  since  r\(_,  obj)  =  T'a\(_,  obj),  we  know  there  exists  j>i 
such  that  Ta(S)  =  T{f). 

Otherwise,  T{j')  —  (t,  obj).  By  the  operational  semantics  and  the  generation 
of  T,  we  know  there  exists  e  such  that 

e  G  pend_inv(T(l..j'  —  1))  and  tid(e)  =  t . 

Since  wait-free(T),  we  know  one  of  the  following  holds: 

(i)  there  exists  I  >  j'  such  that  Vfc  >  1.  tid(r(fc))  ^  t;  or 

(ii)  there  exists  j"  >  j'  such  that  match(e,  r(j")). 

Suppose  (i)  holds.  Since  t  G  div_tids(T),  we  know 

3j".  j">ZAtid(T(j"))=t, 

which  is  a  contradiction. 

Thus  (ii)  must  hold.  Thus  T{j")  =  (t,  ret,  _)  and  j"  >  i' .  Since  T\(_,  obj)  = 
To\(_,  obj),  we  know  there  exists  j  >i  such  that  T'a(j)  =  T{j"). 

Thus  we  have  proved 

3j.  j  >  zAtid(Ta(j))  =t. 

Therefore  t  G  div_tids(Ta). 

(b)  div_tids(Ta)  C  div_tids(r(j): 

For  any  i' ,  since  r\(_,  obj)  =  To\(_,  obj),  we  know  there  exists  i  such  that 
T(l..z')\(_,  obj)  =  Ta(l..z)\(_,  obj).  For  any  t  G  div_tids(ro),  we  know 

3j.  j  >  zAtid(Ta(j))  =t. 

If  Ta{j)  ^  (t,obj),  since  T\(_,  obj)  =  Ta\(-,  obj),  we  know  there  exists 
f  >  i'  such  that  To(j)  =  T{j'). 

Otherwise,  Ta{j)  =  (t,  obj).  By  the  operational  semantics  and  the  generation 
of  Ta,  we  know  one  of  the  following  holds: 

(i)  Vfc  >  j.  t\d{Ta{k))  ^  t;  or 

(ii)  there  exists  j"  >  j  such  that  match(e,  ra(j”))- 
Suppose  (i)  holds.  Since  t  G  div_tids(Ta),  we  know 

3/'. /'>jAtid(T,(j"))=t, 

which  is  a  contradiction. 

Thus  (ii)  must  hold.  Thus  Ta{j'')  =  (t,  ret,  _)  and  j"  >  z.  Since  r\(-,  obj)  = 
To\(_,  obj),  we  know  there  exists  j'  >  i'  such  that  Ta{j")  =  T{j'). 

Thus  we  have  proved 

3/./>t'Atid(r(/))  =  t. 

Therefore  t  G  div_tids(T). 

Thus  we  are  done. 
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B.5  Proofs  of  Theorem  ?? 

Lemma  17  (Finite  trace  must  be  obstruction- free).  For  any  T ,  if 

T  e7I,|(let  n  in  Ci  || . . .  ||  C„),  (ctc,  cto,  ®)1 
and  \T\  ^  uj,  then  obstruction-free(T)  must  hold. 

We  define  the  MGC  version  of  obstruction-freedom,  and  prove  it  is  equivalent 
to  the  original  version. 

Definition  9.  obstruction-free|^'^*'(7T),  iff 

Vn,  ao,T.  T  G  7Ij|(let  77  in  MGC„),  (0,  CTo,  ®)I  A  iso(r)  A  (cto  G  dom{ip)) 

(3A  is_obj_abt(T(i)))  V  (Vi.  3j.  j  >  i  A  is_ret(T(j))) . 

Lemma  18.  obstruction-free<p(7T)  obstruction-free^'^'' (77) . 

Proof.  From  Figure  7,  we  know  obstruction-free<,3(77)  is  equivalent  to  the  follow¬ 
ing: 

Vn,  ,...,Cyn,C'c,Co,T. 

T  G  7b|(let  77  in  Cl  II ...  II  Cn),  (ctc,  o-o)]  A  iso(r)  A  (cto  G  dom{tp)) 
lock-free(T) 

By  Lemma  10,  we  know  it  is  equivalent  to  the  following: 

Vn,  Co,  T.  T  €  7L|(let  77  in  MGC„),  (0,  CTo,  ®)1|  A  iso(T)  A  (cto  G  dom{ip)) 

=>  (3i.  is_obj_abt(r(i)))  V  (Vi.  3j.  j  >  i  A  \sjret{T{j))) . 

Thus  we  are  done.  □ 

Then,  we  only  need  to  prove  the  following  (B.24),  (B.25)  and  (B.26): 

n  Ha  =A  Ha  (B.24) 

n  Ha  =A  obstruction-free^^^ (77)  (B.25) 

77  77yt  A  obstruction-free,^ (77)  ==»  77  11  a  (B.26) 

Proofs  of  (B.24)  For  any  n,  Ci,  . . . ,  Cn,  <Jc,  Oo  and  Ca  such  that 
for  any  T  if 

T  G  OI(let  77  in  Cl  II ...  II  C„),  (a„  Uo,  ®)1, 

we  know  there  exists  Ti  such  that  T  =  get_obsv(Ti)  and 

Ti  G  T|(let  77  in  Cl  II ...  II  C„),  (Jo,  ®)1  ■ 

Thus  there  exists  T[  and  T"  such  that  T"  =  Ti  ::  T{,  where  iso(T{)  holds,  and 
one  of  the  following  holds: 

(i)  ([let  77  in  Cl  II . . .  ||  C„J ,  (ctc,  CTo,  ®))  •;  or 
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(ii)  ([let  77  in  Cl  II ...  II  C„J ,  (ctc,  cto,  ©))  (skip,.);  or 

(iii)  ([let  77  in  Cl  II ...  II  C„J ,  (ctc,  CTo,  ©))  abort. 

Thus, 

T"  e  7I;|(let  77  in  Cl  II ...  II  C„),  (ctc,  CTo,  ®)1  and  iso(r(') . 
Since  77  77^,  we  know  there  exists  T2  such  that 

T2  e  7LI(let  IIa  in  Cl  II . . .  ||  C„),  (ctc,  Ca,  ®)1 , 

and 

get_obsv(T2')  =  get_obsv(T")  =  T ::get_obsv(T{) . 
Thus  there  exists  T2  such  that 

T2  e  TI(let  Ua  in  Cl  II . . .  ||  C„),  (ctc,  CTo,  ®)1 , 
and  get_obsv(r2)  =  T.  Thus 

T  G  C|(let  IIa  in  Ci  || . . .  ||  C„),  (ctc,  CTo,  ©)1 , 

and  we  are  done. 


Proofs  of  (B.25)  The  proof  is  similar  to  the  proof  of  (B.12). 

To  prove  obstruction-free^^''(77),  we  want  to  show:  for  any  n,  CTo,  <7a  and  T,  if 
T  G  7LI(let  77  in  MGC„),  (0,CTo,©)|,  iso(T)  and  =  <Ja,  then  the  following 
(B.14)  holds: 

{3i.  is_obj_abt(T(i)))  V  (Vi.  3j.  j  >i  A  is_ret(T(j))) . 

First,  if  T  G  7LI(let  II  in  MGC„),  (0,  CTq,  ©)|  and  iso(T),  by  Lemma  11(1), 
there  exists  Tp  such  that 

Tp  77  in  MGGpl„),  (0,  a,,,  ©)1,  Tp\(.,  out,  1)  =  T 

and  Vi,  t.  rp(i)  =  (t,  ret,  _)  Tp(i  +  1)  =  (t,  out,  1). 

Since  iso(T),  we  know 

|T|  =  w  ^  3t,i.  (Vj.  j  >i  tid(T(j))  =  t) . 

If  |Tp|  =  uj,  by  the  generation  of  Tp  and  Tp\(_,  out,  1)  =  T,  we  know  |T|  =  oj. 
Thus  there  exist  to  and  i  such  that 

Vj.  j  >i  tid(T(j))  =  to  . 

Since  Tp\(-,  out,  1)  =  T,  we  know  there  exists  ip  such  that 

Vj.  j  >  ip  =>  tid(rp(j))  =  to  V  Tp(j)  =  (_,  out,  1) . 

By  the  generation  of  Tp,  we  know  there  exists  i'  such  that 
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Vj.  j  >  i' 


tid(Tp(j))  =  to  . 


Thus  iso(Tp)  holds. 

Since  77  7T^,  we  know 

C>i<^|(let  7T  in  MGCpl„),  (0,  CTo,  ©)]  C  (!7<^|(let  11  a  in  MGCpl„),  (0,  da,  @)1 . 

From  Lemma  12,  we  know  for  any  T,  if  T  £  Oit^Klet  7T  in  MGCpl„),  (0,  CTq,  ©)], 
then  T  is  an  infinite  trace  of  (_,  out,  1). 

Then  we  know:  get_obsv(Tp)  is  an  infinite  trace  of  (_,  out,  1).  Thus  |Tp|  =  w 
and  the  following  (B.15)  holds: 

VL  3j.  j  >iA  Tp{j)  =  (_,  out,  1) . 

As  in  the  proof  of  (B.12),  we  prove  the  following  (B.16)  from  (B.15): 

VL  3j.  j  >iA  is_ret(rp(j)) . 

Since  Tp\(_,  out,  1)  =  T,  from  (B.16),  we  know  (B.14)  holds  and  we  are  done. 

Proofs  of  (B.26)  We  need  to  prove  that  if  7T  11  a  and  obstruction-freep(77), 

then  for  any  n,  Ci,  . . . ,  Cn,  CTc,  CTo  and  CTq  such  that  ^{<Jo)  =  we  have 

Oiujlilet  77  in  Cl  II . . .  ||C„),  (ctcCTo,®)! 

C  C^^Klet  IIa  in  Cl  II . . .  ||C„),  (ctc,  era,  ©)]  . 

Thus  we  only  need  to  prove:  for  any  T, 

(1)  If  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  cto,  ©))  abort, 
then  there  exists  Ta  such  that 

T 

([let  Ua  in  Ci  || . . .  ||  C„J,  (ctc,  da,  ©))  i— abort  and 
get_obsv(r)  =  get_obsv(Ta). 

(2)  If  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  rA*(skip,_), 
then  there  exists  Ta  such  that 

( [let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  ^  *  (skip,  _)  and 
get_obsv(r)  =  get_obsv(Ta). 

(3)  If  ([let  77  in  Cl  II ...  II  C„J,  (ctc,  CTo,  ©))  •  and  iso(T), 

then  there  exists  Ta  such  that 

([let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  •  and 

get_obsv(r)  =  get_obsv(Ta). 

(1)  and  (2)  are  proved  in  Lemma  13. 

For  (3),  as  in  the  proofs  for  (B.13),  we  define  the  simulation  relation  A  in 
Figure  11(d),  and  prove  the  following  (B.19): 

For  any  W,  S,  Wi,  Si,  W2,  S2,  IF3,  <53,  Tq  and  Ti, 
if  (IF,  5)  is  well-formed  and  out  of  method  calls,  (IF, 5)  1-^*  (lFi,5i), 
(i^i,5i)  (IF2, 52;  1^3,53),  (lFi,5i)  A*"  -  and  lock-free(To ::Ti), 

then  there  exists  T3  such  that  (1F3,53)  •  and 

Ti\(.,obj)=T3\(_,obj). 
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On  the  other  hand,  for  any  n,  Ci,  . . . ,  Cn,  <Jc,  <Jo  and  Ca,  by  Lemma  1,  we  know 
"HKlet  n  in  Ci\\ . .  C  ?^|(let  U  in  MGC„),  (0,  CTo,  @)1 . 

From  n  \—ip  Ua,  by  Lemma  3,  we  know  77  ’^^Ua-  Thus,  if  ^p{oo)  =  Oa,  then 
"HKlet  77  in  MGC„),  (0,  (Tq,  @)|  C  "HKlet  11  a  in  MGG„),  (0,  CTo,  ®)1  • 

Then  we  know 

(let  77  in  Ci  || . . .  ||  C„,  (ctc,  CTo,  ©)) 

(let  Ua  in  MGG„,  (0,  CTo,  ®); 
let  Ua  in  Cl  II . . .  ||  C„,  (ctc,  Ca,  ®)), 

Thus,  if  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ®))  •  and  iso(r), 

by  obstruction-free,^ (77),  we  know  lock-free(r).  Then  from  (B.19)  we  get:  there 
exists  Ta  such  that 

([let  Ua  in  Cl  II . . .  ||  C„J ,  (ctc,  CTo,  ®))  • 

and  T\(_,  obj)  =  Ta\{-,  obj).  Thus  get_obsv(T)  =  get_obsv(Ta)  and  we  are  done. 


B.6  Proofs  of  Theorem  ?? 

We  define  the  MGC  version  of  deadlock-freedom,  and  prove  it  is  equivalent  to 
the  original  version. 

Definition  10.  deadlock-free^*^'“(77),  iff 

Vn,  (To,  T.  T  €  7[,|(let  77  in  MGG„),  (0,  CTo,  ®)1  A  objfair(T)  A  (cto  €  domfp)) 
(37  is_obj_abt(T(i)))  V  (Vi.  3j.  j  >  i  A  is_ret(T(j))) , 

where  objfair(T)  says  object  steps  are  fairly  scheduled: 

objfair(T)  =  |T|  =  to 

(Vt  e  [l..tnum(T)].  Vn.  |(r|t)|  =  n 
^  is_ret((T|t)(n))  V  is_clt((T|t)(n))  V  (r|t)(n)  =  (t,term)) . 

It’s  easy  to  see: 

VT.  fair(T)  objfair(T) . 

Lemma  19.  For  any  T  and  T^,  if  fair(T),  get_objevt(T)  =  get_objevt(Tm), 
|r|  =  uj  and 

T  €  7LI(let  77  in  Cl  II ...  II  C„),  ((TcCTo,®)!  , 

then  objfair(Tm). 


Proof  Suppose  |(Tm|t)|  =  n  and  the  index  of  (Tm|t)(n)  in  T^  is  7  If  is_ret(rm(0) 
or  is_clt(Tm(/))  or  Tm{l)  =  (t,term),  we  are  done.  Otherwise,  we  know 

isJnv(Tm(i))  or  Tm{l)  =  (t,obj) . 
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Since  get_objevt(T)  =  get_objevt(rm),  we  know  there  exists  i  such  that 
T{i)  =  Tm{l)  and  get_objevt(T(l..i))  =  get_objevt(Tj„(l../)) . 

Thus  tid(r(i))  =  t  and 

isJnv(r(i))  or  T{i)  =  (t,  obj) . 

From  fair(r),  we  know 

3j.  j  >  i  Atid(T(j))  =  t. 

By  the  generation  of  T  and  the  operational  semantics,  we  know 

3j.  j  >i^  tid(r(j))  =  t  A  is_obj(T(j)) . 

Since  get_objevt(T)  =  get_objevt(rm),  we  know 

3j.  j  >  I  A  tid(Tm(j))  =  t  A  is_obj(rm(j)) , 

which  contradicts  the  assumption  that  |(T'm|t)|  =  n  and  the  index  of  {Tm\t)in) 
in  Tjn  is  1.  Thus  neither  is_inv(Tm(l))  nor  Tm{l)  =  (t,obj)  holds,  and  we  are 
done.  □ 

Lemma  20.  deadlock-free,^(i7)  deadlock-free!^^^(i7)  . 

Proof.  1.  deadlock-free,p(7T)  =>  dead  lock- free|^'^^(7T) : 

As  in  the  proof  for  Lemma  10,  we  can  prove  the  following  (B.9): 

Vn,  (To,r.  T  G7L|[(let  IT  in  MGC„),  (0,  CTo,  ®)1  A  (cto  €  dom(<p))  A  lock-free(r) 
(3i.  is_obj_abt(r(i)))  V  (Vi.  3j.  j  >  i  A  is_ret(T(j))) 

Then  we  only  need  to  prove  the  following  (B.27): 

\ln,Go,T.  T  G  7LI(let  U  in  MGCn),  (0,  CTo,  ®)I| 

Aobjfair(r)  A  (uo  G  dom{ip))  A  deadlock-free,^(77)  (B.27) 

lock-free(r) 

For  T  such  that  T  G  7)L|(let  U  in  MGC„),  (0,  Co,  @)|  and  objfair(T),  if  |r|  ^ 
w,  then  we  know  fair(r).  By  the  definition  of  deadlock-free,^(i7),  we  know 
lock-free(T).  Otherwise,  we  know  |r|  =  w,  and  let 

5  =  {t  I  3n.  |(T|t)|  =  n  A  (T|t)(n)  ^  (t,term)} 

Then  we  construct  another  program  W  =  let  77  in  Ci  || . . .  ||  Cn  as  follows: 
for  any  t  G  [l..n], 

t  ^  S'  ^  Ct  =  MGT 
t  G  S 

=>  Ct  =  local  it;  it  :=  0; 

while  (it  <  nt){  /rand(m)(i’and()); d  :=  d  -f  1  } 

where  rit  =  |get_hist(T|t)|/2 
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Let  (Jc  =  {zt  0  I  t  G  S'}. 

We  can  construct  a  simulation  between  let  7T  in  MGC„  and  W,  and  show 
that  there  exists  T'  such  that 

T'  G  7Ij|VL,  (ctc,  (To,  ®)1 ,  fair(T')  and  get_objevt(r)  =  get_objevt(r') . 

From  deadlock-free^(i7),  we  know  lock-free(r').  We  can  prove  the  following 
(B.28): 

If  \T\  =  oj,  get_objevt(r)  =  get_objevt(T')  and  lock-free(T') ,  ,  , 

then  lock-free(T) .  \  ■  ) 

Then  we  know  lock-free(T)  and  hence  (B.27)  holds. 

We  prove  (B.28)  as  follows.  Since  |r|  =  w,  we  know  one  of  the  following 
must  hold: 

(i)  there  exists  i  such  that  Vj  >  i.  is_clt(r(j)); 

(ii)  Vz.  3j.  j  >  z  A  is_obj(T(j)). 

For  (i),  we  know  lock-free(T). 

For  (ii),  since  get_objevt(T)  =  get_objevt(T'),  we  know 
Vz.  3j.  j  >iA  is_obj(T'(j)). 

Since  lock-free(T'),  we  know 

for  any  z',  if  pend_inv(T'(l..z'))  ^  0,  then  there  exists  j'  >  i'  such 
that  is_ret(r'(j')). 

For  T,  for  any  z,  we  know  there  exists  i'  such  that 

get_objevt(T(l..z))  =  get_objevt(T'(l..z')) . 

If  pend_inv(T(I..z))  ^  0,  we  know 

pend_inv(r'(I..z'))  ^  0. 

Then  we  get: 

there  exists  j'  >  %'  such  that  is_ret(T'(j')). 

Thus  we  know: 


there  exists  j  >  z  such  that  is_ret(T(j)). 

Therefore  lock-free(T)  and  we  have  proved  (B.28). 

2.  deadlock-free^'^^(i7)  =>  deadlock-free,p(7T) : 

For  any  T  such  that 

T  G  7LI(let  7T  in  Cl  II . . .  ||C„),  (ctc,  (Tq,  ®)|, 

by  Lemma  9,  we  know  one  of  the  following  holds: 

(1)  |T|  ^  w;  or 

(2)  there  exists  i  such  that  Vj  >  z.  is_clt(T(j));  or 

(3)  there  exists  Tm  such  that 

Tm  €  7I;|(let  n  in  MGC„),  (0,  CTo,®)]  , 
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and  get_objevt(r)  =  get_objevt(Tm). 

For  (1),  by  Lemma  8,  we  know  lock-free(T). 

For  (2),  we  know  lock-free(T)  holds  immediately  by  definition. 

For  (3),  suppose  (1)  does  not  hold.  If  fair(T),  by  Lemma  19,  we  know 
objfair(Tm)  holds.  Then  from  deadlock-free^^''(i7),  we  know 

(3i.  is_obj_abt(r„(i)))  V  (VL  3j.  j  >iA  is_ret(r„(j))). 

Thus  we  have: 

(3i.  is_obj_abt(r(i)))  V  (VL  3j.  j  >  i  A  is_ret(T(j))). 

If  3i.  is_obj_abt(T(i)),  we  know  lock-free(T).  Otherwise,  we  know 
Vi.  3j.  j  >iA  is_ret(T(j)). 

Thus,  for  any  i,  if  pend_inv(T(l..i))  ^  0,  then  there  exists  j  >  i  such  that 
is_ret(T(j)).  Therefore  lock-free(T)  and  we  are  done. 

□ 

Then,  we  only  need  to  prove  the  following  (B.29),  (B.30)  and  (B.31): 

n  Ha  Ha  (B.29) 

n  Ha  deadlock-free^^^(i7)  (B.30) 

n  Ua  a  dead  lock-free,^  (7T)  =J>  7T  Ua  (B.31) 

Proofs  of  (B.29)  For  any  n,  Ci,  . . . ,  Cn,  (Tc,  Oq  and  a  a  such  that  ^p(uo)  =  CTo, 
for  any  T  if 

T  G  0|(let  77  in  Cl  II ...  II  C„),  (a„  ®)1, 

we  know  there  exists  Ti  such  that  T  =  get_obsv(Ti)  and 

Ti  G  T|(let  77  in  Cl  II ...  II  C„),  (ctc,  CTo,  ®)1 . 

Thus  there  exists  T[  and  T"  such  that  T"  =  Ti  where  fair(T{)  holds,  and 
one  of  the  following  holds: 

(i)  ([let  77  in  Cl  II . . .  ||  C„J ,  (ctc,  CTo,  ©))  •;  or 

(ii)  ([let  77  in  Cl  II ...  II  C„J ,  (ctc,  Co,  ©))  (skip,.);  or 

rp!  / 

(hi)  ( [let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  *  abort. 

Thus, 

C"  G7I;|(let  77  in  Cl  ||...  II  C„),(ctc,  (To,©)!  and  fair(T(') . 

Since  77  Ua,  we  know  there  exists  T^'  such  that 

T2  G  7LI(let  Ua  in  Ci  || . . .  ||  C„),  (ctc,  Ca,  ©)1 , 
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and 


get_obsv(T2')  =  get_obsv(T")  =  T ::get_obsv(T{) . 
Thus  there  exists  T2  such  that 

T2  e  TI(let  IIa  in  Cl  II . . .  ||  C„),  (ctc,  era,  ®)1 , 
and  get_obsv(r2)  =  T.  Thus 

T  G  C>|(let  IIa  in  Ci  || . . .  ||  C„),  (ctc,  CTo,  ®)1 , 

and  we  are  done. 


Proofs  of  (B.30)  The  proof  is  similar  to  the  proof  of  (B.12),  except  that  we 
need  to  first  prove  the  following  lemma: 

Lemma  21.  Suppose  IIa  is  total.  If  II  IIa,  then 

Oofcalilet  n  in  MGCpl„),  (0,cro,®)l  C  Cc^[(let  IIa  in  MGCpl„),  (0,  (Ta, ®)1 , 

where 

a/u.IlT,51  =  {get.obsv(r)  I  r€X,IlT,<S]  Aobjfair(T) 

AVi,t.  T(i)  =  (t,  ret,_)  r(i  +  1)  =  (t,  out,  1)}  • 


Proof.  For  any  T  and  To  such  that 

T  G7I;|(let  n  in  MGGpl„),  (0,  do,  ®)],  objfair(T), 
Vi,  t.  T(i)  =  (t,  ret,  _)  T(z  +  1)  =  (t,  ont,  1), 

and  To  =  get_obsv(T),  if  |T|  ^  u),  we  know  fair(r)  holds,  thus 

To  G  C/^[(let  n  in  MGGpl„),  (0,  a,,,  ®)1. 

From  n  IIa,  we  know 

ToGCa;[(let  IIa  in  MGCpl„),  (0,  (Ta,  ®)1- 

Otherwise,  we  know  |r|  =  w,  and  let 


^  =  {t|3n.  |(r|0|=nA(r|0(n)^(t,term)} 

=  {tiKTioi^o;}. 

Since  |T|  =  w,  we  know  there  exists  t  such  that  |(T|t)|  =  w  and  hence  t  ^  S'. 
Then  we  construct  another  program  W  =  let  7T  in  Ci  || . . .  ||  C„  as  follows:  for 
any  t  G  [l-.n], 

t  ^  S  Ct  =  MGTpl 
t  G  S 

=>  Ct  =  local  it,  it  '■=  0; 

while  {it  <  nt){ 

.frand(m)(rand());print(l);it  :=  it  +  1; 

} 

where  rzt  =  |get_hist(r|t)|/2 
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Let  (Tc  =  {it  0  I  t  €  S}. 

We  can  construct  a  simulation  between  let  77  in  MGCpl„  and  W,  and  show 
that  there  exists  T'  such  that 

T'  e  7LI(let  77  in  Cl  II ...  II  C„),  ®)1 , 

fair(T')  and  get_obsv(T)  =  get_obsv(T')  =  To. 

Since  77  77a,  we  know 

To  G  C<.jl(let  77a  in  Cl  II . . .  ||  C„),  (acCTa,®)!. 

Thus  there  exists  T”  such  that 

T"  G  7I;|(let  77a  in  Ci  || . . .  ||  C„),  (ctc,  CTa,  ®)1 ,  and  get_obsv(T")  =  To  ■ 

Since  there  exists  t  such  that  Ct  =  MGCpl,  we  can  construct  a  simulation  and 
show  that  there  exists  T"'  such  that 

r"G7;,I(let  77a  in  MGGpl„),  (0,  a,,  ®)1 , 
and  get_obsv(T")  =  get_obsv(r'")  =  To  . 

Thus  we  are  done.  □ 

To  prove  dead  lock- (77),  we  want  to  show:  for  any  n,  ao,  Oa  and  T, 
if  T  G  7Ij|(let  77  in  MGG„),  (0,  CTo,  ®)|,  objfair(r)  and  ip{(Jo)  =  CTo,  then  the 
following  (B.14)  holds: 

(37  is_obj_abt(T(i)))  V  (Vi.  3j.  j  >i  A  is_ret(T(j))) . 

First, ifT  G  7LI(let  77  in  MGG„),  (0,  CTo,  ®)|  and  objfair(T),  by  Lemma  11(1), 
there  exists  Tp  such  that 

Tp  G  7I;|(let  77  in  MGGpl„),  (0,CTo,®)1,  rp\(-,out,  1)  =  T 
and  Vi,  t.  rp(i)  =  (t,  ret,  _)  Tp(i -|- 1)  =  (t,  out,  1). 

Since  objfair(T),  we  know  objfair(Tp)  also  holds. 

Since  77  77a,  by  Lemma  21,  we  know 

Oofcolilet  77  in  MGCpl„),  (0,  CTo,  ®)1  G  Ci.,;[(let  11  a  in  MGCpl„),(0,cra,®)l- 

From  Lemma  12,  we  know  for  any  T,  if  T  G  Co/wI(let  77  in  MGGpl„),  (0,  CTq,  ®)|, 
then  T  is  an  infinite  trace  of  (_,  out,  1). 

Then  we  know:  get_obsv(Tp)  is  an  infinite  trace  of  (_,  out,  1).  Thus  |Tp|  =  w 
and  the  following  (B.15)  holds: 

Vi.  3j.  j  >  i  A  Tp(j)  =  (_,  out,  1) . 

As  in  the  proof  of  (B.12),  we  prove  the  following  (B.16)  from  (B.15): 

Vi.  3j.  j  >  i  A  is_ret(rp(j)) . 

Since  Tp\(_,  out,  1)  =  T,  from  (B.16),  we  get  (B.14)  and  thus  we  are  done. 
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Proofs  of  (B.31)  We  need  to  prove  that  if  U  IIa  and  deadlock-free,^(i7), 
then  for  any  n,  Ci,  . . . ,  Cn,  (Tc,  cto  and  CTo  such  that  =  (Ja,  we  have 

C>/a,|(let  77  in  Cl  II . . .  ||C„), 

C  C^^Klet  IIa  in  Ci  || . . .  ||C„),  (ctcCTo,®)!  . 

Thus  we  only  need  to  prove:  for  any  T, 

(1)  If  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  abort, 
then  there  exists  Ta  such  that 

T 

([let  IIa  in  Ci  || . . .  ||  C„J,  (ctc,  Ua,  ©))  i— abort  and 
get_obsv(r)  =  get_obsv(Ta). 

(2)  If  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc, CTq, ©))  rA*(skip,_), 
then  there  exists  such  that 

( [let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  *  (skip,  _)  and 

get_obsv(r)  =  get_obsv(Ta). 

(3)  If  ([let  77  in  Cl  II ...  II  C„J,  (ctc,  CTo,  ©))  ■  and  fair(T), 

then  there  exists  Ta  such  that 

([let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  •  and 

get_obsv(r)  =  get_obsv(Ta). 

(1)  and  (2)  are  proved  in  Lemma  13. 

For  (3),  as  in  the  proofs  for  (B.13),  we  define  the  simulation  relation  ®  in 
Figure  11(d),  and  prove  the  following  (B.19): 

For  any  W,  S,  Wi,  Si,  W2,  S2,  IF3,  ‘^3,  Tq  and  Ti, 
if  (IF,  5)  is  well-formed  and  out  of  method  calls,  (IF, 5)  1-^*  (lFi,5i), 
(kFi,5i)  (1F2,52;1F3,53),  (Wi,Si)  and  lock-free(To ::Ti), 

then  there  exists  T3  such  that  (1F3,53)  •  and 

Ti\(.,obj)=T3\(_,obj). 

On  the  other  hand,  for  any  n,  Ci,  . . . ,  C„,  CTc,  Uo  and  aa,  by  Lemma  1,  we  know 
77|(let  77  in  Cl  II ...  II  C„),  (ctc,  CTo,  ®)1  ^  77|(let  77  in  MGC„),  (0,  CTo,  ©)] . 
From  77  77^,  by  Lemma  3,  we  know  77  ^^IIa-  Thus,  if  v?(o’o)  =  cr^,  then 

77|(let  77  in  MGC„),  (0,  CTo,  ®)|  C  77|(let  IIa  in  MGG„),  (0,  Ua,  ©)] . 

Then  we  know 

(let  77  in  Cl  II ...  II  C„,  (ctc,  (Tq,  ©)) 

(let  IIa  in  MGG„,  (0,  CTo,  ©); 
let  Ua  in  Cl  II . . .  ||  C„,  {a^,  (Ja,®)), 

Thus,  if  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  (Jo,  ©))  •  and  fair(T), 

by  dead  lock-free,^  (77),  we  know  lock-free(T).  Then  from  (B.19)  we  get:  there 

exists  Ta  such  that 

([let  Ha  in  Cl  II . . .  ||  C„J ,  (ctc,  ©))  • 

and  T\{_,  obj)  =  Ta\{-,  obj).  Thus  get_obsv(T)  =  get_obsv(Ta)  and  we  are  done. 
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B.7  Proofs  of  Theorem  ?? 


We  define  the  MGC  version  of  starvation-freedom,  and  prove  it  is  equivalent  to 
the  original  version. 

Definition  11.  starvation-free^^^(i7),  iff 

Vn,  (To,  T.  T  G  7^|(let  77  in  MGC„),  (0,  (Tq,  @)|  A  objfair(T)  A  ((Tq  G  dom{(p)) 
=>  wait-free(r) 

Lemma  22.  starvation-free,^ (77)  starvation-free|^'^^(7T)  . 

Proof.  1.  starvation-free,^(7T)  =>  starvation-free^’^'" (77) : 

We  only  need  to  prove  the  following  (B.32): 

\/n,Go,T.  T  G7LI(let  77  in  MGC„),  (0,  (To,  ®)| 

Aobjfair(T)  A  (cto  G  dom{tp))  A  starvation-free,,, (77)  (B.32) 

wait-free(r) 

For  T  such  that  r  G  7I,|(let  77  in  MGC„),  (0,  (Tq,  ©)]  and  objfair(T),  if  |T|  yf 
uj,  then  we  know  fair(r).  By  the  definition  of  starvation-free,,5(77),  we  know 
wait-free(T).  Otherwise,  we  know  |r|  =  oj,  and  let 

5  =  {t  I  3n.  |(T|t)|  =  n  A  (T|t)(n)  yf  (t,term)} 

=  {tlKTioi^^}. 

Then  we  construct  another  program  W  =  let  77  in  Ci  || . . .  ||  Cn  as  follows: 
for  any  t  G  [l-.n], 

t  ^  S'  ^  Ct  =  MGT 
t  G  S 

^  Ct  =  local  it,  it  ■=  0; 

while  (d  <  nt){  /rand(m) (randO); q  :=  d  +  1  } 
where  rit  =  |get_hist(T|t)|/2 

Let  (Jc  =  {d  0  I  t  G  S}. 

We  can  construct  a  simulation  between  let  77  in  MGC„  and  W,  and  show 
that  there  exists  T'  such  that 

T'  G  7L|VF,  (cTc,  (To,  ®)]| ,  fair(T')  and  get_objevt(r)  =  get_objevt(T') . 

From  starvation-free,^(77),  we  know  wait-free(T').  We  can  prove  the  following 
(B.33): 


If  |T|  =  w,  get_objevt(r)  =  get_objevt(r')  and  wait-free(r') , 
then  wait-free(r) . 


Then  we  know  wait-free(T)  and  hence  (B.32)  holds. 

We  prove  (B.33)  as  follows.  Since  get_objevt(r)  =  get_objevt(r'),  for  any  i, 
we  know  there  exists  i'  such  that 
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get_objevt(T(l..i))  =  get_objevt(T'(l.i')) . 

For  any  e,  if  e  G  pendJnv(T(l..i)),  we  know 

e  G  pend_inv(T'(l.i')) . 

From  wait-free(T'),  we  know  one  of  the  following  holds: 

(i)  there  exists  f  >  i'  such  that  match(e,  T'(/)). 

(ii)  there  exists  f  >  i'  such  that  f/k'  >  f .  tid(T'(A:'))  ^  tid(e). 

For  (i),  since  get_objevt(r)  =  get_objevt(r'),  we  know 

there  exists  j  >  i  such  that  match(e,  T(j)). 

For  (ii),  assume  (i)  does  not  hold.  Then  we  know  e  G  pend_inv(T').  Since 
get_objevt(T)  =  get_objevt(T'),  we  can  prove 

e  G  pend_inv(r) . 

Let  t  =  tid(e).  Suppose 

Vj  >  i.  Bk  >  j.  tid(T(A:))  =  t . 

Then,  by  the  operational  semantics  and  the  generation  of  T,  we  know 
Vj  >i.3k>  j.  T{k)  =  (t,obj) . 

Since  get_objevt(r)  =  get_objevt(r'),  we  know 

V/>F.  r(A:')  =  (t,obj), 

which  contradicts  (ii).  Thus  we  know 

3j  >  L  Vfc  >  j.  tid(T(A:))  ^  t . 

Therefore  wait-free(r)  and  we  have  proved  (B.33). 

2.  starvation-free|^^‘'(i7)  =4^  starvation-free,^ (77) : 

Almost  the  same  as  the  proof  for  Lemma  15,  except  that  we  need  to  apply 
Lemma  19. 

□ 

Then,  we  only  need  to  prove  the  following  (B.34),  (B.35)  and  (B.36),  where 
(B.34)  is  trivial  from  definitions: 

77  77a  ^  77  TTa  (B.34) 

Ha  starvation-free^^^(7T)  (B.35) 

77  E(^  77a  A  starvation-free,^(7T)  =>  n  IIa  (B.36) 
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Proofs  of  (B.35)  For  any  n,  CTo,  da  and  T  such  that  <f{do)  =  da,  if  T  G 
7Ij|(let  n  in  MGC„),  (0,  CTo,  ©)]  and  objfair(T),  suppose 

-i3z.  is_abt(T(i)), 

then  by  the  operational  semantics,  we  only  need  to  prove: 

for  any  i  and  e,  if  e  G  pend_inv(T(l..t)),  then  there  exists  j  >  i  such  that 
match(e,T(j)). 

Suppose  it  does  not  hold.  Then  we  know  there  exists  to  such  that 

3i.  Vj.  j>i^  =  (to,obj) . 

By  Lemma  11(1),  there  exists  Tp  such  that 

Tp  G  7L[(let  n  in  MGCpl„),  (0,  do,  ®)1 ,  Tp\(-,  out,  1)  =  T 
and  Vt,  t.  Tp(i)  =  (t,  ret,  _)  Tp(i  +  1)  =  (t,  out,  1). 

By  the  operational  semantics,  we  know 

3i.  yj.  j  >i^  {Tp\to)ij)  =  (to,obj) . 

Let 


S  =  {t  I  3n.  |(Tp|t)|  =  n  A  (rp|t)(n)  (t,term)} 
=  {t\\{Tp\,)\^cj}. 

Thus  we  know 


to  ^  S,  and  to  G  div_tids(Tp). 

We  construct  another  program  W  =  let  7T  in  Ci  ||  . . .  ||  C„  as  follows:  for  any 
t  G  [l..n]. 


t  ^  S'  ^  a  =  MGTpl 
t  G  S 

=>  Ct  =  local  it,  it  :=  0; 

while  (it  <  nt){ 

/rand(m)(rand());print(l);it  :=  it  +  1; 

} 

where  rit  =  |get_hist(r|t)|/2 

Let  (Tc  =  {h'^0|tGS}.  We  can  construct  a  simulation  between  let  11  in  MGGpl„ 
and  W,  and  show  that  there  exists  Tp  such  that 

Tp  G7L[(let  n  in  Ci  || . . .  ||  Cn),  (ctc,  CTo,  ®)1 ,  fair(T^) , 
get_objevt(Tp)  =  get_objevt(Tp  and  get_obsv(Tp)  =  get_obsv(rp) . 

Thus  we  know  there  exists  i  such  that 

Vj.  j  >i^  iTp\to)U)  =  (to,  obj) . 
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Thus  we  have 


to  €  div_tids(Tp)  and  |(get_obsv(rp)|to)|  <  i. 

On  the  other  hand,  since  7T  11  a,  we  know: 

Oftuililet  n  in  Cl  II ...  II  C„),  (ctcCTo,  ®)1 
C  Ct^Klet  IIa  in  Cl  II . . .  ||  C„),  (ctcCTo,  ®)1 . 

Thus  there  exists  T"  such  that 

Tp  e  TLKlet  IIa  in  Ci  || . . .  ||  C„),  (ctc,  Ca,  ®)1 , 
get_obsv(T")  =  get_obsv(Tp)  and  div_tids(T")  =  div_tids(Tp) . 

Since  Ctg  =  MGTpl  and  to  €  div_tids(T"),  we  know 

l(r;ito)l  =  ^, 

and  also 

l(get-obsv(Tj^)|to)|  =  l(get-obsv(T")|to)l  =  w, 

which  contradicts  the  fact  that  |(get_obsv(Tp)|to)|  <  *•  Thus  we  know  wait-free(T) 
and  we  are  done. 

Proofs  of  (B.36)  We  need  to  prove  that  if  7T  IIa  and  starvation-free,p(i7), 
then  for  any  n,  Ci,  . . . ,  C„,  CTc,  cto  and  CTa  such  that  (p{(To)  =  (Ja,  we  have 

Oftu^lilet  n  in  Cl  II ...  II  C„),  (ctcCTo,®)! 

C  Ct^Klet  IIa  in  Ci  || . . .  ||  C„),  (ctcCTo,  ®)1 . 

Thus  we  only  need  to  prove:  for  any  T, 

(1)  If  ([let  n  in  Ci||...||C„J,((Jc,cro,®))  abort, 
then  there  exists  Ta  such  that 

([let  IIa  in  Ci  || . . .  ||  C„J,  (ctc,  aa,  ®))  i— abort  and 
get_obsv(r)  =  get_obsv(Ta). 

(2)  If  ([let  n  in  Cl  II ...  II  C„J ,  (ctc,  CTo,  ®))  (skip,_), 

then  there  exists  Ta  such  that 

( [let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc,  (Ja,  ©))  *  (skip,  _)  and 

get_obsv(r)  =  get_obsv(Ta). 

(3)  If  ([let  n  in  Cl  II ...  II  C„J,  (ctc,  CTo,  ®))  ■  and  fair(T), 

then  there  exists  Ta  such  that 

([let  IIa  in  Ci  || . . .  ||  C„J ,  (ctc, CTo,  ©)) 

get_obsv(r)  =  get_obsv(Ta)  and  div_tids(T)  =  div_tids(Ta). 

(1)  and  (2)  are  proved  in  Lemma  13. 

For  (3),  as  in  the  proofs  for  (B.22),  we  dehne  the  simulation  relation  ®  in 
Figure  11(d),  and  prove  the  following  (B.23): 
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For  any  W,  S,  Wi,  Si,  W2,  S2,  W3,  S3,  Tq  and  Ti, 

if  (VF,  5)  is  well- formed  and  out  of  method  calls,  (IF, *S)  1-^*  (VFij^i), 

(TFi,5i)  {W2,S2;W3,S3),  (fFi,5i)  A*"  -  and  wait-free(To iiTi), 

then  there  exists  T3  such  that  (1^3,53)  •  and 

Ti\(_,obj)  =  T3\(_,obj). 

On  the  other  hand,  for  any  n,  C\,  . . . ,  Cn,  <Jc,  <Jo  and  aa,  by  Lemma  1,  we  know 
"HKlet  77  in  Cl  II ...  II  C„),  (ctc,  CTo,  ®)1  ®  77|(let  77  in  MGC„),  (0,  CTo,  ®)1 . 
From  77  Ua,  by  Lemma  3,  we  know  77  ^^^IIa-  Thus,  if  ^p{(Jo)  =  Oa,  then 
77|(let  77  in  MGC„),  (0,  CTo,  ®)1  C  77|(let  11  a  in  MGC„),  (0,  CTo,  ©)] . 

Then  we  know 

(let  77  in  Cl  II ...  II  C„,  (ctc,  (Tq,  ©)) 

(let  Ua  in  MGG„,  (0,  CTo,  ©); 
let  Ua  in  Cl  II . . .  ||  C„,  (ctc,  Ca,  ©)), 

Thus,  if  ([let  77  in  Ci  || . . .  ||  C„J ,  (ctc,  CTo,  ©))  •  and  fair(r), 

by  starvation-free,^(77),  we  know  wait-free(T).  Then  from  (B.23)  we  get:  there 
exists  Ta  such  that 

([let  IIa  in  Cl  II . . .  ||  C„J ,  (ctc,  CTo,  ©))  • ,  and  T\(_,  obj)  =  Ta\(-,  obj). 

Thus  we  know  get_obsv(r)  =  get_obsv(To). 

Below  we  prove:  div_tids(T)  =  div_tids(Ta).  Since  fair(T)  and  |T|  =  to,  we 
know  for  any  t, 

either  |(T|t)|  =  w,  or  last(T|t)  =  (t,  term) . 

(a)  last(T|t)  =  (t, term): 

Since  T\(_,  obj)  =  T(j\(_,  obj)  and  by  the  operational  semantics,  we  know 
last(rolt)  =  (t,term). 

(b)  |(T|t)|=a;: 

Since  T\(_,  obj)  =  Ta\{-,  obj),  we  know 

(T|t)\(t,  obj)  =  (Ta|t)\(t,  obj) . 

Suppose  |(T'a|t)|  ^  w.  Then  we  know  |(ra|t)\(t,  obj)|  ^  00.  Thus 
3i.  Vj.  j  >i  ^  (T|t)(7)  =  (t.obj) . 

By  the  operational  semantics,  we  know  there  exists  i  such  that 

tid(T(t))=t,  is_inv(T(i)) ,  and  ^  -imatch(T(i),  r(j)) . 

By  wait-free(r),  we  know 

3j.  Vfc>j.  tid(T(fc))^t, 

which  contradicts  the  assumption  that  |(r|t)|  =  w. 

Thus  we  know  |(Ta|t)|  =  w. 

Thus  div_tids(T)  =  div_tids(ro)  holds  and  we  are  done. 
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B.8  Proofs  of  Theorem  ?? 


Proofs  of  Theorem  ??(1)  For  any  (Tq,  Ua  and  T  such  that  ipicTo)  =  o'a,  if 
T  e  7LI(let  n  in  Ci),  (ctcCTo,®)!, 
by  Lemma  9,  we  know  one  of  the  following  holds: 

(1)  \T\  or 

(2)  there  exists  i  such  that  Vj  >  i.  is_clt(T(j));  or 

(3)  there  exists  such  that 

Tm  G  7I;|(let  n  in  MGT),  (0,CTo,®)1  , 
and  get_objevt(T)  =  get_objevt(rm)- 

For  (1),  by  the  operational  semantics,  we  can  prove  prog-t(T)  or  abt(r)  holds. 
For  (2),  for  any  k  and  e,  if  e  G  pend_inv(T(l..A:)),  since  there  exists  i  >  k  such 
that  is_clt(T(i)),  by  the  operational  semantics  we  know  there  exists  j  such  that 
k  <  j  <i  and  match(e,  T(j)).  Thus  prog-t(T)  holds. 

For  (3),  by  Lemma  11(1),  there  exists  Tp  such  that 

Tp  G  7Ij|(let  n  in  MGTpl),  (0,CTo,®)]  and  Tp\(_,  out,  1)  =  T. 

Since  77  11  a,  we  know 

O^^Klet  77  in  MGTpl),  (0,  CTo,®)]  C  O^^Klet  Ua  in  MGTpl),  (0,(Ta,®)] . 

From  Lemma  12,  we  know  get_obsv(rp)  is  an  infinite  trace  of  (_,  out,  1).  Thus 
\Tp\  =  u)  and  the  following  (B.15)  holds: 

VL  3j.  j  >iA  Tp{j)  =  (_,  out,  1) . 

As  in  the  proof  of  (B.12),  we  prove  the  following  (B.16)  from  (B.15): 

VT  3j.  j  >iA  is_ret(rp(j)) . 

Since  Tp\{_,  out,  1)  =  T,  we  know 

VT  3j.  j  >i  A  is_ret(T(j)) . 

Thus  for  any  i  and  e,  if  e  G  pend_inv(r(1..7)),  then  there  exists  j  >  i  such  that 
is_ret(T(j))  holds.  By  the  operational  semantics  and  the  generation  of  T,  we 
know  match(e,  T(j))  holds.  Thus  prog-t(T)  holds.  Then  we  are  done. 


Proofs  of  Theorem  ??(2)  We  need  to  prove  that  if  77  77^  and  seq-term^(77), 

then  for  any  Ci,  Uc,  o'a  and  Ca  such  that  (p{cro)  =  o'a,  we  have 

e><^|(let  77  in  Ci),  (ctc,  Co,  ®)1  C  e>c„|(let  Ua  in  Ci),  (ctc,  Ca,  ®)1  • 

Thus  we  only  need  to  prove:  for  any  T, 
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(1)  If  ([let  n  in  CiJ,  ((Tc,CTo,@))  1-^*  abort, 
then  there  exists  Ta  such  that 

([let  Ua  in  CiJ ,  (ctc,  CTo,  ©))  abort  and  get_obsv(T)  =  get_obsv(ra). 

(2)  If  ([let  n  in  CiJ ,  (ctc, CTo, ©))  (skip,_), 

then  there  exists  Ta  such  that 

([let  Ua  in  CiJ,  (ctc,  CTo,  ©))  (skip,.)  and  get_obsv(T)  =  get_obsv(Ta). 

(3)  If  ([let  77  in  CiJ,  (cTcCTo,®)) 
then  there  exists  Ta  such  that 

([let  ITa  in  CiJ ,  (ctc,  Ua,  ©))  •  and  get_obsv(T)  =  get_obsv(Ta). 

(1)  and  (2)  are  proved  in  Lemma  13. 

For  (3),  as  in  the  proofs  for  (B.13),  we  define  the  simulation  relation  ©  in 
Figure  11(d),  and  prove  the  following  (B.19): 

For  any  W,  S,  Wi,  Si,  W2,  S2,  IF3,  S3,  Tq  and  Ti, 

if  (IF,  5)  is  well-formed  and  out  of  method  calls,  (IF, 5)  1-^*  (lFi,5i), 

(W^i,5i)  (1F2,52;IF3,*S3),  (lFi,5i)  A*"  -  and  lock-free(To ::Ti), 

then  there  exists  T3  such  that  (1F3,53)  •  and 

Ti\(_,obj)  =  T3\(_,obj). 

On  the  other  hand,  for  any  n,  Ci,  . . . ,  Cn,  Oc  cFq  and  a  a,  by  Lemma  1,  we  know 
77|(let  77  in  Ci),  (ctc,  CTo,  ®)1  C  "HKlet  77  in  MGCi),  (0,  Uo,  ©)]  • 

From  77  IT  a,  by  Lemma  3,  we  know  77  ^^pIlA-  Thus,  if  (p{(Jo)  =  ^a,  then 

"HKlet  77  in  MGCi),  (0,  Co,  ©)]  C  "HKlet  Ua  in  MGGi),  (0,  Ua,  ©)] . 

Then  we  know 

(let  77  in  Cl,  {(Jc,(Jo,@)) 

:<  (let  IIa  in  MGGi,  (0,  Ua,  ©); 
let  IIa  in  Ci,  (ctc,  Ua,  ©)), 

Thus,  if  ([let  77  in  CiJ ,  (ctc,  CTo,  ©))  •,  by  seq-term,^(77),  we  know  lock-free(r). 

Then  from  (B.19)  we  get:  there  exists  Ta  such  that 

([let  77a  in  CiJ ,  (ctc,  (Xa,  ©))  • 

and  get_obsv(r)  =  get_obsv(To),  thus  we  are  done. 
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A  Separation  Logic  for  Enforcing  Declarative 
Information  Flow  Control  Policies 


David  Costanzo  and  Zhong  Shao 
Yale  University 


Abstract.  In  this  paper,  we  present  a  program  logic  for  proving  that  a 
program  does  not  release  information  about  sensitive  data  in  an  unin¬ 
tended  way.  The  most  important  feature  of  the  logic  is  that  it  provides 
a  formal  security  guarantee  while  supporting  “declassification  policies” 
that  describe  precise  conditions  under  which  a  piece  of  sensitive  data 
can  be  released.  We  leverage  the  power  of  Hoare  Logic  to  express  the 
policies  and  security  guarantee  in  terms  of  state  predicates.  This  allows 
our  system  to  be  far  more  specific  regarding  declassification  conditions 
than  most  other  information  flow  systems. 

The  logic  is  designed  for  reasoning  about  a  C-like,  imperative  language 
with  pointer  manipulation  and  aliasing.  We  therefore  make  use  of  ideas 
from  Separation  Logic  to  reason  about  data  in  the  heap. 


1  Introduction 

Information  Flow  Control  (IFC)  is  a  field  of  computer  security  concerned  with 
tracking  the  propagation  of  information  through  a  system.  A  primary  goal  of 
IFC  reasoning  is  to  formally  prove  that  a  system  does  not  inadvertently  leak 
high-security  data  to  a  low-security  observer.  A  major  challenge  is  to  precisely 
define  what  ’’inadvertently”  should  mean  here. 

A  simple  solution  to  this  challenge,  taken  by  many  IFC  systems  (e.g.,  [4,5, 
II,  16, 19]),  is  to  dehne  an  information-release  policy  using  a  lattice  of  security 
labels.  A  noninterference  property  is  imposed:  information  cannot  flow  down 
the  lattice.  Put  another  way,  any  data  that  the  observer  sees  can  only  have  been 
influenced  by  data  with  label  less  than  or  equal  to  the  observer’s  label  in  the 
lattice.  This  property  is  sometimes  called  pure  noninterference. 

Purely-noninterfering  systems  are  unfortunately  not  very  useful.  Almost  all 
real-world  systems  need  to  violate  noninterference  sometimes.  For  example,  con¬ 
sider  one  of  the  most  standard  security-sensitive  situations:  password  authen¬ 
tication.  In  order  for  a  password  to  be  useful,  there  must  be  a  way  for  a  user 
to  submit  a  guess  at  the  password.  If  the  guess  is  incorrect,  then  the  user  will 
be  informed  as  such.  However,  the  information  that  the  guess  was  incorrect  is 
dependent  on  the  password  itself;  the  user  (who  might  be  a  malicious  attacker) 
learns  that  the  password  is  definitely  not  the  one  that  was  guessed.  This  repre¬ 
sents  a  flow  of  information  (albeit  a  minor  one)  from  the  high-security  password 
to  the  low-security  user,  thus  violating  noninterference.  In  a  purely  noninterfer¬ 
ing  system,  sensitive  data  has  no  way  whatsoever  of  affecting  the  outcome  of  a 
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computation,  and  so  the  situation  is  essentially  equivalent  to  the  data  not  being 
present  in  the  system  at  all. 

There  have  been  numerous  attempts  at  refining  the  notion  of  inadvertent 
information  release  beyond  the  rules  of  a  strict  lattice  structure.  IFC  systems 
commonly  allow  for  some  method  of  declassification,  a  term  used  to  describe  an 
information  leak  (i.e.,  an  information  flow  moving  down  the  security  lattice)  that 
is  understood  to  be  in  some  way  “acceptable”  or  “purposeful”  (as  opposed  to 
“inadvertent”).  These  declassifications  violate  the  pure  noninterference  property 
described  above.  Ideally,  an  IFC  system  should  still  provide  some  sort  of  security 
guarantee  even  in  the  presence  of  declassification.  It  is  quite  rare,  however,  for  a 
system  to  have  a  satisfactory  formal  guarantee.  Those  that  do  usually  must  make 
significant  concessions  that  limit  the  generality  and  usefulness  of  the  system. 

Our  goal  is  to  leverage  the  strengths  of  a  program  logic  to  devise  a  powerful 
IFC  system  that  provides  formal  security  guarantees  even  in  the  presence  of 
declassification.  It  turns  out  that  we  can  use  state  predicates  to  refine  the  pure 
noninterference  property  into  one  that  cleanly  describes  exactly  how  a  piece  of 
high-security  data  could  affect  observable  output.  Instead  of  simply  saying  that 
an  observer  cannot  distinguish  between  any  values  of  the  high-security  data,  we 
say  that  the  observer  cannot  distinguish  between  any  values  among  a  particular 
set  —  the  set  described  by  the  state  predicate. 

Our  contributions  in  this  paper  are  as  follows: 

—  We  define  a  novel,  security-aware  semantics  for  a  simple  imperative  language 
with  pointer  arithmetic  and  aliasing  that  tracks  information  flow  through 
label  propagation.  We  show  that  this  semantics  is  sensible  by  relating  its 
executions  back  to  a  standard  (security-ignorant)  small-step  operational  se¬ 
mantics. 

—  We  present  a  program  logic  for  formally  verifying  the  safety  of  a  program 
under  the  security-aware  semantics.  The  logic  builds  on  ideas  from  Hoare 
Logic  [6]  and  Separation  Logic  [13, 14]. 

—  We  prove  a  strong  security  guarantee  for  any  program  that  is  verified  using 
our  program  logic.  This  guarantee  is  a  generalization  of  traditional  pure 
noninterference. 

—  All  of  the  technical  work  in  this  paper  is  fully  formalized  and  proved  in  the 
Coq  proof  assistant. 

The  remainder  of  this  paper  is  organized  as  follows:  Section  2  informally 
discusses  how  our  system  works  and  highlights  contributions;  Section  3  defines 
our  language,  state  model,  and  operational  semantics;  Section  4  describes  the 
program  logic  and  its  soundness  theorem  relative  to  the  operational  semantics; 
Section  5  describes  the  noninterference-based  security  guarantee  provided  by  the 
program  logic;  and  Section  6  describes  related  work  and  concludes. 

2  Informal  Discussion 

In  this  section,  we  will  describe  our  system  informally  in  order  to  provide  some 
high-level  motivation.  We  pick  a  starting  point  of  a  C-like,  imperative  language 
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with  pointer  arithmetic  and  aliasing,  as  we  would  like  our  logic  to  be  applicable 
to  low-level  systems  code.  The  main  operations  of  our  language  are  variable 
assignment  x  :=  E,  heap  dereference/load  x  :=  [E],  and  heap  dereference/store 
[E]  :=  E' .  The  expressions  E  can  be  any  standard  mathematical  expressions 
on  program  variables,  so  pointer  arithmetic  is  allowed.  Aliasing  is  also  clearly 
allowed  since  [x]  and  [y]  refer  to  the  same  heap  location  if  x  and  y  contain  the 
same  value. 

2.1  Security  Labels 

Our  language  semantics  will  track  information  flow  by  attaching  a  security  label 
to  every  value  in  the  program  state.  For  simplicity  of  presentation,  we  will  assume 
that  the  only  labels  are  Lo  and  Hi  (a  more  general  version  of  our  system  allows 
labels  to  be  any  set  of  elements  that  form  a  lattice  structure).  Unlike  many  IFC 
systems,  we  attach  the  label  to  the  value  rather  than  the  location.  This  means 
that  a  program  is  allowed  to,  for  example,  overwrite  some  Lo  data  stored  in 
variable  x  with  some  other  Hi  data.  Many  other  systems  would  instead  label  the 
location  x  as  Lo,  meaning  that  Hi  data  could  never  be  written  into  it.  Supporting 
label  overwrites  allows  our  system  to  verify  a  wider  variety  of  programs. 

Label  propagation  is  done  in  a  mostly  obvious  way.  If  we  have  a  direct  as¬ 
signment  such  as  X  :=  y,  then  the  label  of  y’s  data  propagates  into  x  along 
with  the  data  itself.  We  compute  the  composite  label  of  an  expression  such  as 
2  *  a;  -I-  z  to  be  the  least  upper  bound  of  the  labels  of  its  constituent  parts  (for  the 
two-element  lattice  of  Lo  and  Hi,  this  will  be  Lo  if  and  only  if  each  constituent 
label  is  Lo).  For  the  heap-read  command  x  :=  [E],  we  must  propagate  both  the 
label  of  E  and  the  label  of  the  data  located  at  heap  address  E  into  x.  In  other 
words,  if  we  read  some  low-security  data  from  the  heap  using  a  high-security 
pointer,  the  result  must  be  tainted  as  high  security  in  order  for  our  information 
flow  tracking  to  be  accurate.  Similarly,  the  heap-write  command  [E\  :=  E'  must 
propagate  both  the  label  of  E'  and  the  label  of  pointer  E  into  the  location  E  in 
the  heap.  As  a  general  rule  for  any  of  these  atomic  commands,  we  compute  the 
composite  label  of  the  entire  read-set,  and  propagate  that  into  all  locations  in 
the  write-set. 

2.2  Noninterference 

As  discussed  in  Section  I,  the  ultimate  goal  of  our  IFC  system  is  to  prove  a  formal 
security  guarantee  that  holds  for  any  verified  program.  The  standard  security 
guarantee  is  noninterference,  which  says  that  the  initial  values  of  Hi  data  have 
no  effect  on  the  “observable  behavior”  of  a  program’s  execution.  We  choose  to 
define  observable  behavior  in  terms  of  a  special  output  channel.  We  include  an 
output  command  in  our  language,  and  an  execution’s  observable  behavior  is 
defined  to  be  exactly  the  sequence  of  values  that  the  execution  outputs. 

The  standard  way  to  express  this  noninterference  property  formally  is  in 
terms  of  two  executions:  a  program  is  deemed  to  be  noninterfering  if  two  ex¬ 
ecutions  of  the  program  from  observably  equivalent  initial  states  always  yield 
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identical  outputs.  Two  states  are  defined  to  be  observably  equivalent  when  only 
their  high-security  values  differ.  Thus  this  property  describes  what  one  would 
expect:  changing  the  value  of  any  high-security  data  in  the  initial  state  will  cause 
no  change  in  the  program’s  output. 

One  of  our  key  insights  is  that  this  noninterference  property  can  be  refined  by 
requiring  a  precondition  to  hold  on  the  initial  state  of  an  execution.  That  is,  we 
alter  the  property  to  say  that  two  executions  will  yield  identical  outputs  if  they 
start  from  two  observably  equivalent  states  that  both  satisfy  some  state  predicate 
P.  This  weakening  of  noninterference  is  interesting  for  two  reasons.  First,  it 
provides  a  link  between  information  flow  security  and  Hoare  Logic  (a  program 
logic  that  derives  pre/postconditions  as  state  predicates).  Second,  this  property 
describes  a  certain  level  of  dependency  between  high-security  inputs  and  low- 
security  outputs,  rather  than  the  complete  independence  of  pure  noninterference. 
This  means  that  a  program  that  satisfies  this  weaker  noninterference  may  be 
semantically  declassifying  data.  In  this  sense,  we  can  use  this  property  as  an 
interesting  security  guarantee  for  a  program  that  may  declassify  some  data. 

To  better  understand  this  weaker  version  of  noninterference,  let  us  consider 
a  few  examples. 

Public  Parity  Suppose  we  have  a  variable  x  that  contains  some  high-security 
data.  We  wish  to  specify  a  declassification  policy  which  says  that  only  the  parity 
of  the  Hi  value  can  be  released  to  the  public.  We  will  accomplish  this  by  verifying 
the  security  of  some  program  with  a  precondition  P  that  says  “x  contains  high 
data,  y  contains  low  data,  and  y  =  a:%2” .  Our  security  property  then  says  that 
if  we  have  an  execution  from  some  state  satisfying  P,  then  changing  the  value  of 
X  will  not  affect  the  output  as  long  as  the  new  state  also  satisfies  P.  Since  y  is 
the  parity  of  x  and  is  unchanged  in  the  two  executions,  this  means  that  as  long 
as  we  change  x  to  some  other  value  that  has  the  same  parity,  the  output  will  be 
unchanged.  Indeed,  this  is  exactly  the  property  that  one  would  expect  to  have 
with  a  policy  that  releases  only  the  parity  of  a  secret  value:  only  the  secret’s 
parity  can  influence  the  observable  behavior. 

Public  Average  Suppose  we  have  three  secrets  stored  in  x,  y,  and  z,  and  we 
are  only  willing  to  release  their  average  as  public  (e.g.,  the  secrets  are  employee 
salaries  at  a  particular  company).  This  is  similar  to  the  previous  example,  except 
that  we  now  have  multiple  secrets.  The  precondition  P  will  say  that  x,  y,  and  2; 
all  contain  Hi  data,  a  contains  Lo  data,  and  a  =  (x  +  y  +  z)/S.  In  this  situation, 
noninterference  will  say  that  we  can  change  the  value  of  the  set  of  secrets  from 
any  triple  to  any  other  triple,  and  the  output  will  be  unaffected  as  long  as  the 
average  of  the  three  values  is  unchanged. 

Public  Zero  Suppose  we  have  a  a  secret  stored  in  x,  and  we  are  only  willing  to 
release  it  if  it  is  zero.  We  could  take  the  approach  of  the  previous  two  examples 
and  store  a  public  boolean  in  another  variable  which  is  true  if  and  only  if  x  is  0. 
However,  there  is  an  even  simpler  way  to  represent  the  desired  policy  without 
using  an  extra  variable.  Our  precondition  P  will  say  that  either  x  is  0  and 
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its  label  is  Lo,  or  x  is  nonzero  and  its  label  is  Hi.  This  is  an  example  of  a 
conditional  label:  a  label  whose  value  depends  on  some  state  predicate.  If  x  is 
0,  then  noninterference  says  nothing  since  there  is  no  high-security  data  in  the 
state.  If  X  is  nonzero,  then  noninterference  says  that  changing  its  value  (but  not 
its  label)  will  have  no  effect  on  the  output  as  long  as  P  still  holds;  in  order  for 
P  to  still  hold,  we  must  be  changing  x  to  some  other  nonzero  value.  Hence  all 
nonzero  values  of  x  will  look  the  same  to  an  observer.  Conditional  labels  are  a 
novelty  of  our  system;  we  will  see  in  Section  4  how  they  can  be  a  powerful  tool 
for  verifying  the  security  of  a  program. 


3  Language  and  Semantics 


Our  programming  language  is  defined  as  follows: 


(Exp) 

E 

=  x  \  c\ 

E  +  E\  ■■■ 

(BExp) 

B 

=  false 

\  E  =  E\  BA 

(Cmd) 

C  : 

=  skip  1 

output  E  \  X  : 

if  B  then  C  else  C 


=  E\  x:=[E] 
while  B  doC 


[E]  :=E\  C;C 


Valid  code  includes  variable  assignment,  heap  load/store,  if  statements,  while 
loops,  and  output.  Our  model  of  a  program  state,  consisting  of  a  variable  store 
and  a  heap,  is  given  by: 


(Lbl) 

L  : 

=  Lo  1  Hi 

(Val) 

V  : 

=  Z  X  Lbl 

(Store) 

s  : 

=  Var  — )•  option  Val 

(Heap) 

h  : 

=  N  — >■  option  Val 

(State) 

cr  : 

=  is,h) 

Given  a  variable  store  s,  we  define  a  denotational  semantics  |i?]s  that  evaluates 
an  expression  to  a  pair  of  integer  and  label,  with  the  label  being  the  least  upper 
bound  of  the  labels  of  the  constituent  parts.  The  denotation  of  an  expression 
also  may  evaluate  to  None,  indicating  that  the  program  state  does  not  contain 
the  necessary  resources  to  evaluate.  We  have  a  similar  denotational  semantics  for 
boolean  expressions.  The  formal  definitions  of  these  semantics  are  omitted  here 
as  they  are  standard  and  straightforward.  Note  that  we  will  sometimes  write 
\E\(j  as  shorthand  for  |i?]  applied  to  the  store  of  state  a. 

Figure  I  dehnes  our  operational  semantics.  The  semantics  is  security-aware, 
meaning  that  it  keeps  track  of  security  labels  on  data  and  propagates  these  labels 
throughout  execution  in  order  to  track  which  values  might  have  been  influenced 
by  some  high-security  data.  The  semantics  operates  on  machine  configurations, 
which  consist  of  program  state,  code,  and  a  list  of  commands  called  the  con¬ 
tinuation  stack  (we  use  a  continuation-stack  approach  solely  for  the  purpose  of 
simplifying  some  proofs).  The  transition  arrow  of  the  semantics  is  annotated  with 
a  program  counter  label,  which  is  a  standard  IFC  construct  used  to  keep  track  of 
information  flow  resulting  from  the  control  flow  of  the  execution.  Whenever  an 
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|[i7]|s  =  Some  (n,  1) 


{{s,  h),  X  ■.=  E,  K)  — >•  ((s[a;  !->■  {n,lu  l')],  h),  skip,  K) 


(ASSGN) 


|[i?|s  =  Some  {ni,li)  h{ni)  —  Some  (712,^2) 

((s,  h),  X  :=  [E],  K)  — >  ((s[a;  !->■  (712,  U  ^2  U  l')\,h),  skip,  K) 


(READ) 


|[E|s  =  Some  (ni,  h)  h(ni)  7^  None  |[-E^|s  =  Some  (722,  h) 
{{s,  h),  [E]  :=  E' ,  K)  — >•  ((s,  h[ni  i-T-  (n2,  Zi  U  Z2  U  l')]),  skip,  K) 


(WRITE) 


|[E]](T  =  Some  (n,  Lo) 


[nl 

{a,  output  E,  K)  — >  {a,  skip,  K) 


(OUTPUT) 


|[-B]]ct  =  Some  (true,  1)  I  ^  l' 

{a,  if  B  then  Cl  else  (72,  K)  — >  (cr,  Ci,  K) 


(IE-TRUE) 


J-BJcr  =  Some  (false,/)  I  C  l' 


(cr,  if  B  then  Ci  else  C2,  K) 


C2,  K) 


(IE-FALSE) 


|[-B|cr  =  Some  (_,  Hi) 

(mark_vars(cr,  if  B  thenCi  else  C2),  if  B  thenCi  else  C2, 


(cr  ,  skip. 


(cr,  if  B  then  Cl  else  C2,  K)  — >  (a',  skip,  K) 

Lo 

JBJcr  =  Some  (true,/)  I  ^  l' 


(cr,  while  B  do  C,  K)  — ¥  (cr,  C;  while  B  do  C,  K) 


(WHILE-TRUE) 


(IF-HI) 


|[B|cr  =  Some  (false,/)  /  C  l' 

(cr,  while  B  do  C,  K)  — >  (cr,  skip,  K) 


(WHILE-FALSE) 


|[B|cr  =  Some  (_,  Hi) 

(mark_vars(cr,  while  B  do  C),  while  B  do  C,  [])  — >n  skip,  []) 


(cr,  while  B  do  C,  K)  — >  {a  ,  skip,  K) 

Lo 


(WHILE-HI) 


(cr,  Ci;C2,  K)  {a,  Ci,  C2  ::  K) 


(SEQ) 


(cr,  skip,  C  ::  K)  — pT-  (cr,  C,  K) 


(SKIP) 


(cr,  C,  K)  ^0  (<T,  C,  K) 

i 


(ZERO) 


(cr,  C,  K)  ^  (a,  C',  K')  (cr',  C',  K')  (a,  C",  K")  n  >  0 

_ I _ i _ 

(<7,  C,  K)  ^t+i  (<7",  C",  K”) 

i 


(SUCC) 


Fig.  1.  Security- Aware  Operational  Semantics 
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execution  enters  a  conditional  construct,  it  raises  the  pc  label  by  the  label  of  the 
boolean  expression  evaluated;  the  pc  label  then  taints  any  assignments  that  are 
made  within  the  conditional  construct.  The  transition  arrow  is  also  annotated 
with  a  list  of  outputs  (equal  to  the  empty  list  when  not  explicitly  written)  and 
the  number  of  steps  (equal  to  1  when  not  explicitly  written). 

Two  of  the  rules  for  conditional  constructs  make  use  of  a  function  called 
mark_vars.  The  function  mark_vars((T,  (7)  alters  a  by  setting  the  label  of  each 
variable  in  modif  ies(C')  to  Hi,  where  modif  ies(C)  is  a  standard  syntactic  func¬ 
tion  returning  an  overapproximation  of  the  store  variables  that  may  be  modified 
by  C.  Thus,  whenever  we  raise  the  pc  label  to  Hi,  our  semantics  taints  all  store 
variables  that  appear  on  the  left-hand  side  of  an  assignment  or  heap-read  com¬ 
mand  within  the  conditional  construct,  even  if  some  of  these  commands  do  not 
actually  get  executed.  Note  that  regardless  of  which  branch  of  an  if  statement 
is  taken,  the  semantics  taints  all  the  variables  in  both  branches.  This  is  required 
for  noninterference,  due  to  the  well-known  fact  that  the  lack  of  assignment  in  a 
branch  of  an  if  statement  can  leak  information  about  the  branching  expression. 
Consider,  for  example,  the  following  program: 

1  y  :=  1; 

2  if  (x  =  0)  then  y  :=  0  else  skip; 

3  if  (y  =  0)  then  skip  else  output  1; 

Suppose  X  contains  Hi  data  initially,  while  y  contains  Lo  data.  If  x  is  0,  then  y 
will  be  assigned  0  at  line  2  and  tainted  with  a  Hi  label  (by  the  pc  label).  Then 
nothing  happens  at  line  3,  and  the  program  produces  no  output.  If  x  is  nonzero, 
however,  nothing  happens  at  line  2,  so  y  still  has  a  Lo  label  at  line  3.  Thus  the 
output  command  at  line  3  executes  without  issue.  Therefore  the  output  of  this 
program  depends  on  the  Hi  data  in  x,  even  though  our  instrumented  semantics 
executes  safely.  We  choose  to  resolve  this  issue  by  using  the  mark_vars  function 
in  the  semantics.  Then  y  will  be  tainted  at  line  2  regardless  of  the  value  of  x, 
and  so  the  semantics  will  get  stuck  at  line  3  when  x  is  nonzero.  In  other  words, 
we  would  only  be  able  to  verify  this  program  with  a  precondition  saying  that 
a;  =  0  —  the  program  is  indeed  noninterfering  with  respect  to  this  precondition 
(according  to  our  generalized  noninterference  definition  described  in  Section  2). 

The  operational  semantics  presented  here  is  mixed-step  and  manipulates  se¬ 
curity  labels  directly.  In  order  make  sense  of  such  a  non-standard  semantics,  we 
need  to  relate  it  in  some  way  to  a  standard  one.  A  standard,  single-step  seman¬ 
tics  is  defined  in  the  Appendix.  This  semantics  operates  on  states  without  labels, 
and  it  does  not  use  continuation  stacks.  Given  a  state  a  with  labels,  we  write  a 
to  represent  the  same  state  with  all  labels  erased  from  both  the  store  and  heap. 
We  will  also  use  r  to  range  over  states  without  labels.  Then  the  following  two 
theorems  hold: 

Theorem  1.  Suppose  {a,  C,  [])  — {a',  skip,  [])  in  the  instrumented  seman¬ 
tics.  Then,  for  some  t,  {a,  C)  — (r,  skip)  in  the  standard  semantics. 
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P,  Q  ::=  emp  \  E  \  E  {n,  1)  \  B  \  x.lbl  =  I  \  x.lbl  C  I 
I  lbl(£;)  ^1\3X.P\PaQ\P\/Q\P*Q 


m 

{s,h)  €  |emp| 
(s,  h)  G  IE  i-A  _| 
(s,  h)  G  fE  (n,  1)1 
(s,  h)  G  IBl 
(s,  h)  G  Jx.lbl  =  11 
(s,  h)  G  Jx.lbl  C  11 

{s,h)  G  iibi(s)  =  ;i 

(s,  h)  G  px  .  PI 
{s,h)  e  IPAQI 
{s,h)  G  IPVQI 

{s,h)  G  IP*Q1 


P(state) 

/i  =  0 

4=^  3a,  n,  I  .  |[P|s  =  Some  a  A  h  =  [a  i-A  {n,  Z)] 
4=^  3a  .  JPIs  =  Some  a  A  h  =  [a  i-A  {n,  Z)] 

4==^  |[PIs  =  Some  true 
4=^  3n  .  s{x)  —  Some  {n,l) 

4=^  3n,l'  .  s(a;)  =  Some  {n,l')  and  l'  C  I 

I _ I  snd(s(a:))  =  I 

xGv^ts{E) 

^  3w  G  Z  +  Lbl  .  (s,  h)  G  lP[v/X]1 
^  {s,  h)  G  IPI  n  IQI 
^  (s,  h)  G  IPI  U  IQI 

/  3ho,  hi  .  ho  it!  hi  =  h'^ 
and  (s,  ho)  G  |[PI 
\  and  (s,  hi)  G  |[QI 


Fig.  2.  Assertion  Syntax  and  Semantics 


Theorem  2.  Suppose  {a,  C)  — (r,  skip)  in  the  standard  semantics,  and 

suppose  {a,  C,  [])  never  gets  stuck  when  executed  in  the  instrumented  semantics. 
Then,  for  some  a' ,  {a,  C,  [])  — (a',  skip,  [])  in  the  instrumented  semantics. 

These  theorems  together  guarantee  that  the  two  semantics  produce  identical 
observable  behaviors  (outputs)  on  terminating  executions,  as  long  as  the  instru¬ 
mented  semantics  does  not  get  stuck.  Our  program  logic  will  of  course  guarantee 
that  the  instrumented  semantics  does  not  get  stuck  in  any  execution  satisfying 
the  precondition. 

4  The  Program  Logic 

In  this  section,  we  will  present  the  logic  that  we  use  for  verifying  the  security  of 
a  program.  A  logic  judgment  takes  the  form  I  h  {P}  C  {Q}.  P  and  Q  are  the 
pre-  and  postconditions,  C  is  the  program  to  be  executed,  and  I  is  the  pc  label 
under  which  the  program  is  verified.  P  and  Q  are  .state  assertions,  whose  syntax 
and  semantics  are  given  in  Figure  2. 

Note  We  allow  assertions  to  contain  logical  variables,  but  we  elide  the  details 
here  to  avoid  complicating  the  presentation.  In  Figure  2,  we  claim  that  the  type 
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of  |P]  is  a  set  of  states  —  in  reality,  the  type  is  a  function  from  logical  variable 
environments  to  sets  of  states.  In  an  assertion  like  E  i— )■  (n,  1),  the  n  and  I  may 
be  logical  variables  rather  than  constants. 

Definition  1  (Sound  judgment).  We  say  that  a  judgment  I  h  {Pj  C  {Qj  is 
sound  if,  for  any  state  a  €  |P],  the  following  two  properties  hold: 

1.  The  operational  semantics  cannot  get  stuck  when  executed  from  initial  con¬ 
figuration  {a,  C,  [])  under  context  1. 

2.  If  the  operational  semantics  executes  from  initial  configuration  {a,  C,  [])  un¬ 
der  context  I  and  terminates  at  state  a' ,  then  a'  €  |Q]. 

Selected  inference  rules  for  our  logic  are  shown  in  Figure  3.  The  rules  make 
use  of  two  auxiliary  syntactic  functions,  P\x  and  P\a;.lbl  (read  the  backslash 
operator  as  “delete”).  P\x  replaces  any  atomic  assertions  within  P  referring  to 
X  by  the  assertion  true.  Similarly,  P\x.lbl  replaces  atomic  assertions  referring 
to  x.lbl  by  true.  We  also  sometimes  abuse  notation  and  write  P\S  or  P\S'.lbl, 
where  S'  is  a  set  of  variables,  to  indicate  the  iterative  folding  of  these  functions 
over  the  set  S.  The  important  fact  about  these  auxiliary  functions  is  that,  if  P 
holds  on  some  state  and  we  perform  an  assigment  into  x,  then  P\x  will  hold 
on  the  resulting  state.  Furthermore,  if  we  change  only  the  label  of  x  without 
touching  its  data  (this  is  done  by  the  mark_vars  function  described  in  Section  3), 
then  P\a:.lbl  will  hold  on  the  resulting  state. 

Here  are  a  few  interesting  points  to  note  about  these  inference  rules: 

—  While  the  rules  shown  here  mostly  involve  detailed  reasoning  about  label 
propagation,  we  can  also  prove  the  soundness  of  simpler  versions  of  the  rules 
that  do  not  reason  about  labels  and,  consequentially,  do  not  have  any  label- 
related  proof  obligations. 

—  The  (IF)  and  (WHILE)  rules  may  look  rather  complex,  but  almost  all  of 
that  is  just  describing  how  to  reason  about  the  mark_vars  function  that 
gets  applied  at  the  beginning  of  a  conditional  construct  when  the  pc  label 
increases. 

—  An  additional  complexity  present  in  the  (IF)  rule  involves  the  labels  It  and 
If.W  fact,  these  labels  describe  a  novel  and  interesting  feature  of  our  system: 
when  verifying  an  if  statement,  it  might  be  possible  to  reason  that  the  pc 
label  gets  raised  by  It  in  one  branch  and  by  //  in  the  other,  based  on  the 
fact  that  B  holds  in  one  branch  but  not  in  the  other.  This  is  interesting  if 
It  and  If  are  different  labels.  In  every  other  static-analysis  IFC  system  we 
are  aware  of,  a  particular  pc  label  must  be  determined  at  the  entrance  to 
the  conditional,  and  this  pc  label  will  propagate  to  both  branches.  We  will 
provide  an  example  program  later  in  this  section  that  illustrates  this  novelty. 

Given  our  logic  inference  rules,  we  can  prove  the  following  theorem: 

Theorem  3  (Soundness).  If  I  F  {P}C{Q}  is  derivable  according  to  our  in¬ 
ference  rules,  then  it  is  a  sound  judgment,  as  defined  in  Definition  1. 
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inark_vars(P,  5*, =  < 


(SKIP) 


I  h  {P}  skip  {P} 


PVS.lbl  A  /\lUl'  n  x.lbl 

\x^S 

P  ^  Ibl(P)  =  Lo 
Lo  h  {P}  output  E  {P} 

Ibl(P)  =  I 


if  Z  C  Z' 


otherwise 


(OUTPUT) 


Z'  h  {P}  X--E  {{P\x)[E/x\  A  a:.lbl  =  Z  U  Z'} 

P  =>  Ibl(P)  =  Zi  P  ^  E  1-^  (n,  I2) 

I  h  {P}  X  :=  [P]  {P\x  A  X  =  n  A  a:.lbl  =  Zi  U  Z2  U  Z} 

P  ^  Ibl(P)  =  Zi  P  ^  Ibl(P')  =  Z2  P^  E^ 


7^  (ASSIGN) 
(READ) 
(WRITE) 


Z  h  {P}  [E]  ■-  E'  {P  A3n  .  E  ^  (n,  Zi  U  Z2  U  Z)  A  P'  =  n} 

P  A  P  =>  Ibl(P)  =  It 

-^B  A  P  =>  Ibl(P)  =  If  S  —  modif  ies(if  B  then  Ci  else  C2) 
ItUl'  h  {B  A  mark_vars(P,  S,  h,  Z^)}  Ci  {Q} 

If  U  l'  h  {-^B  A  mark_vars(P,  S',  Z/,  l')}  C2  {Q} 

l'  h  {P}  if  B  then  Ci  else  C2  {Q} 

P  =>  Ibl(P)  =  Z  S  —  modif  ies(while  B  do  C) 
lU  l'  \-  {B  A  mark_vars(P,  S,  I,  Z^)}  C  {mark_vars(P,  S,  Z,  Z^)} 


(IF) 


l'  h  {P}  while  B  doC  {*^P  A  mark_vars(P,  S,  Z,  Z^)} 
Zh{P}Ci{Q}  lh{Q}C2{R} 


(WHILE) 


Z  h  {P}Ci-,C2{R} 

P'^P  Q^Q'  lh{P}C{Q} 

TVWYcW} 

l^{Pi}C{Qi}  Zh{P2}C'{Q2} 

z  h  {Pi  AP2}C'{QiAg2} 

Z  h  {P}  C  {Q}  modif  ies((7)  n  vars(P)  = 


I  \- {P*R}C{Q*R} 


(SEQ) 

(CONSEQ) 

(CONJ) 
(FRAME) 


Fig.  3.  Selected  Inference  Rules  for  the  Logic 
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1  i  :=  0; 

2  while  (i  <  64)  do 

3  X  :  =  [A+i]  ; 

4  if  (x  =  0) 

5  then 

6  output  i 

7  else 

8  skip; 

9  i  :=  i+1 


Fig.  4.  Example:  Alice’s  Private  Calendar 


We  will  not  go  over  the  proof  of  this  theorem  here  since  there  is  not  really  any¬ 
thing  novel  about  it  in  regards  to  security.  The  proof  is  relatively  straightforward 
and  not  significantly  different  from  soundness  proofs  in  other  Hoare/separation 
logics.  The  primary  theorem  in  this  work  is  the  one  that  says  that  any  verihed 
program  satisfies  our  noninterference  property  —  this  will  be  discussed  in  detail 
in  Section  5. 

4.1  Example:  Alice’s  Calendar 

In  the  remainder  of  this  section,  we  will  show  how  our  logic  can  be  used  to  verify 
an  interesting  example.  Figure  4  shows  a  program  that  we  would  like  to  prove 
is  secure.  Alice  owns  a  calendar  with  64  time  slots  beginning  at  some  location 
designated  by  constant  A.  Each  time  slot  is  either  0  if  she  is  free  at  that  time, 
or  some  nonzero  value  representing  an  event  if  she  is  busy.  Alice  decides  that  all 
free  time  slots  in  her  calendar  should  be  considered  low  security,  while  the  time 
slots  with  events  should  be  secret.  This  policy  allows  for  others  to  schedule  a 
meeting  time  with  her,  as  they  can  determine  when  she  is  available.  Indeed,  the 
example  program  shown  here  prints  out  all  free  time  slots. 

Figure  5  gives  an  overview  of  the  verification,  omitting  a  few  trivial  details.  In 
between  each  line  of  code,  we  show  the  current  pc  label  and  a  state  predicate  that 
currently  holds.  The  program  is  verified  with  respect  to  Alice’s  policy,  described 
by  the  precondition  P  defined  in  the  figure.  This  precondition  is  the  iterated 
separating  conjunction  of  64  calendar  slots;  each  slot’s  label  is  Lo  if  its  value  is 
0  and  Hi  otherwise.  A  major  novelty  of  this  verihcation  regards  the  conditional 
statement  at  lines  4-8.  As  mentioned  earlier,  in  other  IFC  systems,  the  label  of 
the  boolean  expression  “x  =  0”  would  have  to  be  determined  at  the  time  of 
entering  the  conditional,  and  its  label  would  then  propagate  into  both  branches 
via  the  pc  label.  In  our  system,  however,  we  can  reason  that  the  expression’s 
label  (and  hence  the  resulting  pc  label)  will  be  different  depending  on  which 
branch  is  taken.  If  the  “true”  branch  is  taken,  then  we  know  that  x  is  0,  and 
hence  we  know  from  the  state  assertion  that  its  label  is  Lo.  This  means  that 
the  pc  label  is  Lo,  and  so  the  output  statement  within  this  branch  will  not  leak 
high-security  data.  If  the  “false”  branch  is  taken,  however,  then  we  can  reason 
that  the  pc  label  will  be  Hi,  meaning  that  an  output  statement  could  result  in 
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P  =  >1<  {A  + i  1-^  {ni,li)  AUi  =  0 

i=0 


Lo  h  {P} 

1 

i  :=  0; 

Lo  h  {P  A  0  <  i  A  i.lbl  =  Lo} 

2 

while  (i  <  64)  do 

Lo  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo} 

3 

X  :  =  [A+i]  ; 

Lo  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo  A 

(x  =  0  x.lbl  =  Lo)} 

4 

if  (x  =  0) 

5 

then 

Lo  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo  A 

X  =  0  A  x.lbl  =  Lo} 

6 

output  i 

Lo  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo  A 

X  =  0  A  x.lbl  =  Lo} 

Lo  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo} 

7 

else 

Hi  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo  A 

X  /  0  A  x.lbl  =  Hi} 

8 

skip; 

Hi  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo  A 

x  /  0  A  x.lbl  =  Hi} 

Hi  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo} 

Lo  h  {P  A  0  <  i  <  64  A  i.lbl  =  Lo} 

9 

i  :=  i+1 

Lo  h  {P  A  0  <  i  A  i.lbl  =  Lo} 

Lo  h  {P  A  i  >  64  A  0  <  i  A  i.lbl  =  Lo} 

Fig.  5.  Calendar  Example  Verification 
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a  leaky  program  (e.g.,  if  the  value  of  x  were  printed).  This  program  does  not 
attempt  to  output  anything  within  this  branch,  so  it  is  still  valid. 

Since  the  program  is  verified  with  respect  to  precondition  P,  the  noninterfer¬ 
ence  guarantee  for  this  example  says  that  if  we  change  any  high-security  event 
in  Alice’s  calendar  to  any  other  high-security  event  (i.e.,  nonzero  value),  then 
the  output  will  be  unaffected.  In  other  words,  an  observer  cannot  distinguish  be¬ 
tween  any  two  events  occurring  at  a  particular  time  slot.  This  seems  like  exactly 
the  property  Alice  would  want  to  have,  given  that  her  policy  specifies  that  all 
free  slots  are  Lo  and  all  events  are  Hi. 


Aside  on  Completeness  Our  system  is  not  complete  —  there  are  plenty  of 
programs  that  are  noninterfering  with  respect  to  some  precondition,  but  cannot 
be  verified  in  our  logic  using  that  precondition.  For  example,  if  we  slightly  modify 
the  program  of  Figure  4  by  changing  line  8  to  output  i,  then  the  program  will 
always  output  all  the  numbers  from  0  to  63  in  order,  regardless  of  values  of  high- 
security  data.  We  would  not  be  able  to  verify  the  program,  however,  because  the 
pc  label  is  Hi  at  line  8  and  thus  disallows  any  output.  Interestingly,  we  have  found 
in  our  experience  that  we  can  always  rewrite  a  secure-but-unverifiable  program 
in  such  a  way  that  it  produces  the  same  output  and  becomes  verihable.  For  this 
example,  it  suffices  to  rewrite  the  program  to  simply  print  out  the  numbers  0 
through  63  (without  branching  on  elements  in  Alice’s  calendar). 

A  rather  more  complex  example  can  be  obtained  by  swapping  lines  6  and 
8  in  the  code  of  Figure  4.  This  program  prints  out  all  the  time  slots  that  are 
not  free.  Changing  any  (nonzero)  event  to  any  other  (nonzero)  event  will  not 
change  this  output,  so  the  program  is  still  secure  with  respect  to  Alice’s  policy. 
It  is  not  verihable  for  the  same  reason  as  before  —  output  is  disallowed  at  line  8. 
Nevertheless,  this  program  can  be  rewritten  in  the  following  way  (assume  we  add 
to  the  precondition  that  we  have  a  64-element  array  hlled  with  Lo  O’s,  starting 
at  location  B): 

1  i  :=  0; 

2  while  (i  <  64)  do 

3  X  :  =  [A+i]  ; 

4  if  (x  =  0)  then  [B+i]  :=  1  else  skip; 

5  i  :=  i+1; 

6  i  :=  0; 

7  while  (i  <  64)  do 

8  X  :  =  [B+i]  ; 

9  if  (x  =  0)  then  output  i  else  skip; 

10  i  :=  i+1; 

The  ability  to  rewrite  these  safe-but-unverihable  programs  is  a  completely  in¬ 
formal  observation  we  have  made.  A  formal  result  is  beyond  the  scope  of  this 
paper,  but  we  hope  to  explore  it  in  future  work. 
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5  Noninterference 


In  this  section,  we  will  discuss  the  method  for  formally  proving  our  system’s 
security  guarantee.  Much  of  the  work  has  already  been  done  through  careful 
design  of  the  security-aware  semantics  and  the  inference  rules  of  the  program 
logic.  The  fundamental  idea  is  that  we  can  find  a  bisimulation  relation  for  our 
Lo-context  instrumented  semantics.  This  relation  will  guarantee  that  two  exe¬ 
cutions  operate  in  lock-step,  always  producing  the  same  program  continuation 
and  output. 

The  bisimulation  relation  we  will  use  is  called  observable  equivalence.  It  intu¬ 
itively  says  that  the  low-security  portions  of  two  states  are  identical;  the  relation 
is  commonly  used  in  many  IFC  systems  as  a  tool  for  proving  noninterference. 
In  our  system,  states  CTi  and  cr2  are  observably  equivalent  if:  (1)  they  contain 
equal  values  at  all  locations  that  are  present  and  Lo  in  both  states;  and  (2)  the 
presence  and  labels  of  all  store  variables  are  the  same  in  both  states.  This  may 
seem  like  a  rather  odd  notion  of  equivalence  (in  fact,  it  is  not  even  transitive, 
so  “equivalence”  is  a  misnomer  here)  —  two  states  can  be  observably  equivalent 
even  if  some  heap  location  contains  Hi  data  in  one  state  and  Lo  data  in  the 
other.  To  see  why  we  need  to  define  observable  equivalence  in  this  way,  consider 
a  heap-write  command  [x]  :=  E  where  a;  is  a  Hi  pointer.  If  we  vary  the  value  of 
X,  then  we  will  end  up  writing  to  two  different  locations  in  the  heap.  Suppose 
we  write  to  location  100  in  one  execution  and  location  200  in  the  other.  Then 
location  100  will  contain  Hi  data  in  the  first  execution  (as  the  Hi  pointer  taints 
the  value  written),  but  it  may  contain  Lo  data  in  the  second  since  we  never  wrote 
to  it.  Thus  we  design  observable  equivalence  so  that  this  situation  is  allowed. 

The  following  definitions  describe  observable  equivalence  formally: 

Definition  2  (Observable  Equivalence  of  Stores).  Suppose  Si  and  S2  are 

variable  stores.  We  say  that  they  are  observably  equivalent,  written  si  ~  S2,  if, 
for  all  program  variables  x: 

—  If  Si{x)  =  None,  then  S2(x)  =  None. 

—  If  Si{x)  =  Some  (vi,Hi),  then  S2(x)  =  Some  (u2,Hi)  for  some  V2. 

—  If  Si{x)  =  Some  (v,  Lo),  then  S2{x)  =  Some  (ti,Lo). 


Definition  3  (Observable  Equivalence  of  Heaps).  Suppose  hi  and  h2  are 

heaps.  We  say  that  they  are  observably  equivalent,  written  hi  ~  /12,  if  for  all 
natural  numbers  n: 

—  If  hi{n)  =  Some  (rii,  Lo)  and  h2{n)  =  Some  (u2,  Lo),  then  vi  =  V2. 

We  say  that  two  states  are  observably  equivalent  (written  ai  ~  ^2)  when 
both  their  stores  and  heaps  are  observably  equivalent.  Given  this  definition, 
we  define  a  convenient  relational  denotational  semantics  for  state  assertions  as 
follows: 

(cti,  0-2)  e  {PT  cn  €  |P1  A  (72  G  I^’l  A  CTi  ~  0-2 
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In  order  to  state  noninterference  cleanly,  it  helps  to  define  a  “bisimulation 
semantics”  consisting  of  the  following  single  rule  (the  side  condition  will  be 
discussed  below): 


(ai,  C,  K)  ^  (ai,  C",  K') 

Lo 

(ct2,  C,  K)  (cto,  C',  K')  (side  condition) 

Lo 

(cti,  (72,  C,  K)  - ^  (cr'i,  (72,  C",  K') 

Note  that  this  bisimulation  semantics  operates  on  configurations  consisting  of  a 
pair  of  states  and  a  program.  With  this  definition,  we  can  split  noninterference 
into  the  following  progress  and  preservation  properties. 

Theorem  4  (Progress).  Suppose  we  derive  Lo  h  {P}C{Q}  using  our  pro¬ 
gram  logic.  For  any  ((71,(72)  G  |Pp,  suppose  we  have 

((7i,  (72,  C,  K)  — {a[,  CT2,  C",  K'), 


where 

that 


(j'2  and  {C ,K')  ^  (skip,  []).  Then  there  exist  a'{,  C" 


K"  such 


((7^,  a',  C,  K') 


(O'!, 


// 

, 


C",  K" 


Theorem  5  (Preservation).  Suppose  we  have  ai  ~  CT2  and  {ai,  (72,  C,  K) 
{a'l,  (72,  C ,  K').  Then  a[  ~  cr^. 


For  the  most  part,  the  proofs  of  these  theorems  are  relatively  straightforward. 
Preservation  requires  proving  the  following  two  simple  lemmas  about  Hi-context 
executions: 


1.  Hi-context  executions  never  produce  output. 

2.  If  the  initial  and  final  values  of  some  location  differ  across  a  Hi-context 
execution,  then  the  location  must  have  a  Hi  label  in  the  final  state. 

There  is  one  significant  difficulty  in  the  proof  that  requires  discussion.  If  C 
is  a  heap- read  command  x  :=  [E\,  then  Preservation  does  not  obviously  hold. 
The  reason  for  this  comes  from  our  odd  definition  of  observable  equivalence;  in 
particular,  the  requirements  for  a  heap  location  to  be  observably  equivalent  are 
weaker  than  those  for  a  store  variable.  Yet  the  heap-read  command  is  copying 
directly  from  the  heap  to  the  store.  In  more  concrete  terms,  the  heap  location 
pointed  to  by  E  might  have  a  Hi  label  in  one  state  and  Lo  label  in  the  other; 
but  this  means  x  will  now  have  different  labels  in  the  two  states,  violating  the 
definition  of  observable  equivalence  for  the  store. 

We  resolve  this  difficulty  via  the  side  condition  in  the  bisimulation  semantics. 
The  side  condition  says  that  the  situation  we  just  described  does  not  happen. 
More  formally,  it  says  that  if  C  has  the  form  x  :=  \E\,  then  the  heap  location 
pointed  to  by  E  in  ai  has  the  same  label  as  the  heap  location  pointed  to  by  E 
in  CT2. 
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This  side  condition  is  sufficient  for  proving  Preservation.  However,  we  still 
need  to  show  that  the  side  condition  holds  in  order  to  prove  Progress.  This  fact 
comes  from  induction  over  the  specific  inference  rules  of  our  logic.  For  example, 
consider  the  (READ)  rule  from  Section  4: 

P  lbl(i?)  =  li  P  =>  E  1-^  (n,  I2) 

_ ^  ^  ^  ' _  ('READ') 

I  h  {P}  X  :=  [E\  {P\x  Ax  =  n  A  x.lbl  =  Zi  U  ^2  LI  /} 

In  order  to  use  this  rule,  we  are  required  to  show  that  the  precondition  implies 
E  I— )■  (n,  12)-  Since  both  states  cti  and  CT2  satisfy  the  precondition,  we  see  that 
the  heap  locations  pointed  to  by  E  both  have  label  I2,  and  so  the  side  condition 
holds.  Note  that  the  side  condition  holds  even  if  I2  is  a  logical  variable  rather 
than  a  constant. 

In  order  to  prove  that  the  side  condition  holds  for  every  verified  program,  we 
need  to  show  it  holds  for  all  inference  rules  involving  a  heap-read  command.  In 
particular,  this  means  that  no  heap-read  rule  in  our  logic  can  have  a  precondition 
that  only  implies  if  1— )■  _. 

Now  that  we  have  the  Progress  and  Preservation  theorems,  we  can  easily 
combine  them  to  prove  the  overall  noninterference  theorem  for  our  instrumented 
semantics: 

Theorem  6  (Noninterference,  Instrumented  Semantics).  Suppose  we  de¬ 
rive  Lo  h  {P}C{Q}  using  our  program  logic.  Pick  any  state  ai  €  |P],  and 
consider  changing  the  values  of  any  Hi  data  in  cti  to  obtain  some  02  G  |P]. 
Suppose,  in  the  instrumented  semantics,  we  have 

(cTi,  C,  [])  (cr'i,  skip,  0) 

Lo 


and 

{(^2,  c,  [])  ((72,  Skip,  0). 

Lo 

Then  oi  =02. 


Finally,  we  can  use  the  results  from  Section  3  along  with  the  safety  guaranteed 
by  our  logic  to  prove  the  final,  end-to-end  noninterference  theorem: 


Theorem  7  (Noninterference,  True  Semantics).  Suppose  we  derive  Lo  h 
{P}  C  {Q}  using  our  program  logic.  Pick  any  state  cti  G  |P],  and  consider  chang¬ 
ing  the  values  of  any  Hi  data  in  ui  to  obtain  some  (J2  G  |P] .  Suppose,  in  the 
true  semantics,  we  have 


and 


(cti,  C)  (ti,  skip) 

(ct2,  C)  {t2,  skip). 


Then  Oi  =02. 
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6  Related  Work 


There  are  many  different  systems  for  reasoning  about  information  flow.  We  will 
briefly  discuss  some  of  the  more  closely-related  ones  here. 

Some  IFC  systems  with  declassification,  such  as  Hi-Star  [22],  Flume  [8],  and 
RESIN  [21],  reason  at  the  operating  system  or  process  level,  rather  than  the 
language  level.  These  systems  can  support  complex  security  policies,  but  their 
formal  guarantees  suffer  due  to  how  coarse-grained  they  are. 

On  the  language- level  side  of  IFC  [15],  there  are  many  type  systems  and 
program  logics  that  share  similarities  with  our  logic. 

Amtoft  et  al.  [1]  develop  a  program  logic  for  proving  noninterference  of  a 
program  written  in  a  simple  object-oriented  language.  They  use  relational  asser¬ 
tions  of  the  form  “x  is  independent  from  high-security  data.”  Such  an  assertion  is 
equivalent  to  saying  that  x  contains  Lo  data  in  our  system.  Thus  their  logic  can 
be  used  to  prove  that  the  final  values  of  low-security  data  are  independent  from 
initial  values  of  high-security  data  —  this  is  pure  noninterference.  Note  that,  un¬ 
like  our  system,  theirs  does  not  attempt  to  reason  about  declassification.  This  is 
the  primary  advantage  of  our  system  over  theirs.  Some  other  differences  between 
these  IFC  systems  are: 

—  We  allow  pointer  arithmetic,  while  they  disallow  it  by  using  an  object- 
oriented  language.  Pointer  arithmetic  adds  significant  complexity  to  infor¬ 
mation  flow  reasoning.  In  particular,  their  system  uses  a  technique  similar  to 
our  mark_vars  function  for  reasoning  about  conditional  constructs,  except 
that  they  syntactically  check  for  all  locations  in  both  the  store  and  heap 
that  might  be  modified  within  the  conditional.  With  the  arbitrary  pointer 
arithmetic  of  our  system,  it  is  not  possible  to  syntactically  bound  which 
heap  locations  will  be  written  to,  so  we  require  the  additional  semantic  tech¬ 
nique  described  in  Section  5  that  involves  enforcing  a  side  condition  on  the 
bisimulation  semantics. 

—  Our  model  of  observable  behavior  provides  some  extra  leniency  in  verifi¬ 
cation.  Our  system  allows  bad  leaks  to  happen  within  the  program  state, 
so  long  as  these  leaks  are  not  made  observable  via  an  output  command.  In 
their  system  (and  most  other  IFC  systems),  the  enforcment  mechanism  must 
prevent  those  leaks  within  program  state  from  happening  in  the  first  place. 

Banerjee  et  al.  [3]  develop  an  IFC  system  that  specifies  declassification  poli¬ 
cies  through  state  predicates  in  basically  the  same  way  that  we  do.  For  example, 
they  might  have  a  (relational)  precondition  of  “A(a:  >  y),”  saying  that  two  states 
agree  on  the  truth  value  oi  x  >  y.  This  corresponds  directly  to  a  precondition 
of  “a;  >  j/”  in  our  system,  and  security  guarantees  for  the  two  systems  are  both 
stated  relative  to  the  precondition.  The  two  systems  have  very  similar  goals,  but 
there  are  a  number  of  significant  differences  in  the  basic  setup  that  make  the 
systems  quite  distinct: 

—  Their  system  does  not  attempt  to  reason  about  the  program  heap  at  all.  They 
have  some  high-level  discussions  about  how  one  might  support  pointers  in 
their  setup,  but  there  is  nothing  formal. 
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—  Their  system  enforces  noninterference  primarily  through  a  type  system  (rather 
than  a  program  logic).  The  declassification  policies,  specified  by  something 
similar  to  a  Hoare  triple,  are  only  used  at  specific  points  in  the  program  where 
explicit  “declassify”  commands  are  executed.  A  type  system  enforces  pure 
noninterference  for  the  rest  of  the  program  besides  the  declassify  commands. 
Their  end-to-end  security  guarantee  then  talks  about  how  the  knowledge  of 
an  observer  can  only  increase  at  those  points  where  a  declassify  command 
is  executed  (a  property  known  in  the  literature  as  “gradual  release”).  Thus 
their  security  guarantee  for  individual  declassification  commands  looks  very 
similar  to  our  version  of  noninterference,  but  their  end-to-end  security  guar¬ 
antee  looks  quite  different.  We  do  not  believe  that  there  is  any  comparable 
notion  of  gradual  release  in  our  system,  as  we  do  not  have  explicit  program 
points  where  declassification  occurs. 

—  Because  they  use  a  type  system,  their  system  must  statically  pick  security 
labels  for  each  program  variable.  This  means  that  there  is  no  notion  of 
dynamically  propagating  labels  during  execution,  nor  is  there  any  way  to 
express  our  novel  concept  of  conditional  labels.  As  a  result,  the  calendar 
example  program  of  Section  4  would  not  be  verifiable  in  their  system. 

Delimited  Release  [16]  is  an  IFC  system  that  allows  certain  prespecified  ex¬ 
pressions  (called  escape  hatches)  to  be  declassified.  For  example,  a  declassifica¬ 
tion  policy  for  high-security  variable  h  might  say  that  the  expression  h%2  should 
be  considered  low  security.  Relaxed  Noninterference  [9]  uses  a  similar  idea,  but 
builds  a  lattice  of  semantic  declassification  policies,  rather  than  syntactic  es¬ 
cape  hatches  —  e.g.,  h  would  have  a  policy  of  Ax  .  x%2.  Our  system  can  easily 
express  any  policy  from  these  systems,  using  a  precondition  saying  that  some 
low-security  data  is  equal  to  the  escape  hatch  function  applied  to  the  secret  data. 
Our  strong  security  guarantee  is  identical  to  the  formal  guarantees  of  both  of 
these  systems,  saying  that  the  high-security  value  will  not  affect  the  observable 
behavior  as  long  as  the  escape  hatch  valuation  is  unchanged. 

Relational  Hoare  Type  Theory  (RHTT)  [12]  is  a  logic  framework  for  verify¬ 
ing  information-flow  properties.  It  is  based  on  a  highly  general  relational  logic. 
The  system  can  be  used  to  reason  about  a  wide  variety  of  security-related  no¬ 
tions,  including  declassification,  information  erasure,  and  state-dependent  access 
control.  One  advantage  of  our  system  over  RHTT  is  that  we  have  fine-tuned 
our  system  for  reasoning  about  noninterference.  A  program  verification  in  our 
system  requires  a  relatively  small  amount  of  work,  since  much  of  the  noninter¬ 
ference  proof  is  already  handled  by  the  framework.  RHTT,  on  the  other  hand, 
is  extremely  general  to  the  point  that  if  you  want  to  prove  an  information  flow 
property  on  a  program,  you  need  to  formulate  the  property  as  a  relational  type 
and  manually  prove  that  the  program  has  that  type.  This  has  to  be  done  for  each 
program  on  an  individual  basis  —  there  are  no  overarching  security  properties 
that  hold  for  all  verified  programs. 

Intransitive  noninterference  [10]  is  a  declassification  mechanism  whereby  cer¬ 
tain  specific  downward  flows  are  allowed  in  the  label  lattice.  The  system  formally 
verifies  that  a  program  obeys  the  explicitly-allowed  flows.  These  special  flows  are 
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intransitive  —  e.g.,  we  might  allow  Alice  to  declassify  data  to  Bob  and  Bob  to 
declassify  to  Charlie,  but  that  does  not  imply  that  Alice  is  allowed  to  declas¬ 
sify  to  Charlie.  The  intransitive  noninterference  system  is  used  to  verify  simple 
imperative  programs;  their  language  is  basically  the  same  as  ours,  except  with¬ 
out  the  heap-related  commands.  One  idea  for  future  work  is  to  generalize  our 
state  predicate  P  into  an  action  G  that  precisely  describes  the  transformation 
that  a  program  is  allowed  to  make  on  the  state.  If  we  implemented  this  idea, 
it  would  be  easy  to  embed  the  intransitive  noninterference  system.  The  action 
G  would  specify  exactly  which  special  flows  are  allowed  (e.g.,  the  data’s  label 
can  be  changed  from  Alice  to  Bob  or  from  Bob  to  Charlie,  but  not  from  Alice 
to  Charlie  directly).  Ideally,  we  would  have  a  formal  noninterference  theorem  in 
terms  of  G  that  would  give  the  same  result  as  the  formal  guarantee  in  [10]. 

All  of  the  language-based  IFC  systems  mentioned  so  far,  including  our  own 
system,  use  static  reasoning.  There  are  also  many  dynamic  IFC  systems  (e.g.,  [2, 
7,18,20])  that  attempt  to  enforce  security  of  a  program  during  execution.  Be¬ 
cause  dynamic  systems  are  analyzing  information  flow  at  runtime,  they  will  incur 
some  overhead  cost  in  execution  time.  Static  IFC  systems  need  not  necessarily 
incur  extra  costs.  Indeed,  in  our  system  we  have  a  “true  machine”  that  executes 
on  states  with  all  labels  erased.  The  security-aware  machine  is  for  reasoning 
purposes  only;  it  will  never  be  physically  executed. 

Sabelfeld  and  Sands  [17]  define  a  road  map  for  analyzing  declassification  poli¬ 
cies  in  terms  of  four  dimensions:  who  can  declassify,  what  can  be  declassified, 
when  can  declassification  occur,  and  where  can  it  occur.  Our  notion  of  declassi¬ 
fication  can  talk  about  any  of  these  dimensions  if  we  construct  the  precondition 
in  the  right  way.  The  who  dimension  is  most  naturally  handled  via  the  label  lat¬ 
tice,  but  one  could  also  imagine  representing  principals  explicitly  in  the  program 
state  and  reasoning  about  them  in  the  logic.  The  what  dimension  is  handled  by 
default,  as  the  program  state  contains  all  of  the  data  to  be  declassified.  The  when 
dimension  can  easily  be  reasoned  about  by  including  a  time  field  in  the  state. 
Similarly,  the  where  dimension  can  be  reasoned  about  by  including  an  explicit 
program  counter  in  the  state. 

7  Conclusion 

In  this  paper,  we  described  a  novel  program  logic  for  reasoning  about  information 
flow  in  a  low-level  language.  The  primary  novelties  of  our  system  include: 

1.  Information  flow  reasoning  (including  declassification)  in  the  presence  of 
pointer  arithmetic. 

2.  Connecting  the  static  enforcement  mechanism  with  a  dynamic  semantics 
that  tracks  propagation  of  security  labels. 

3.  Reasoning  about  labels  conditioned  on  state  predicates.  As  far  as  we  are 
aware,  the  example  program  of  Section  4  (which  makes  use  of  conditional 
labels)  cannot  be  verified  as  secure  in  any  other  IFC  system. 

In  the  future,  we  hope  to  extend  our  work  to  handle  termination-sensitivity, 
dynamic  memory  allocation/deallocation,  nondeterminism,  and  concurrency. 
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8  Appendix 


|[-E|s  =  Some  n 


((s,  h),  X  :=  E)  — >  ((•s[a;  n],  h),  skip) 

|[-B|s  =  Some  ni  h{ni)  —  Some  n2 
{{s,  h),  X  :=  [£])  — >■  ((s[a;  !->■  712],  h),  skip) 

|[-E]]s  =  Some  ni  h{ni)  7^  None  =  Some  n2 


((s,  h),  [£]  :=  E')  — >•  {{s,  h[ni  i->-  n2]),  skip) 
|[-E]]r  =  Some  n 


(ASSGN) 

(READ) 

(WRITE) 


[n] 

(r,  output  E)  — y  (r,  skip) 
|[i3]]r  =  Some  true 


(OUTPUT) 


(r,  if  B  then  Ci  else  C2)  — >  (t,  Ci) 
|[-B]]r  =  Some  false 


(r,  if  then  Ui  else  (72) 
|[-B]|t  =  Some  true 


{r,  C2) 


(r,  while  B  do  C)  — >  (t,  C;  while  B  do  C) 
|[-B|t  =  Some  false 


(IE-TRUE) 

(IF-FALSE) 

(WHILE-TRUE) 


(r,  while  B  do  C) 
{r,  Cl)  ^  (r',  C[) 


skip) 


(WHILE-FALSE) 


(r,  Ci;C2)  ^  (r',  CUC2) 


(SEQ) 


(r,  skip;  C)  — )■  (r,  C) 


(SKIP) 


Fig.  6.  Standard  Operational  Semantics 
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Abstract 

Verified  compilers  guarantee  the  preservation  of  semantic  properties 
and  thus  enable  formal  verification  of  programs  at  the  source  level. 
However,  important  quantitative  properties  such  as  memory  and  time 
usage  still  have  to  be  verified  at  the  machine  level  where  interactive 
proofs  tend  to  be  more  tedious  and  automation  is  more  challenging. 

This  article  describes  a  framework  that  enables  the  formal 
verification  of  stack-space  bounds  of  compiled  machine  code  at 
the  C  level.  It  consists  of  a  verified  CompCert-based  compiler  that 
preserves  quantitative  properties,  a  verified  quantitative  program 
logic  for  interactive  stack-bound  development,  and  a  verified  stack 
analyzer  that  automatically  derives  stack  bounds  during  compilation. 

The  framework  is  based  on  event  traces  that  record  function  calls 
and  returns.  The  source  language  is  CompCert  Clight  and  the  target 
language  is  x86  assembly.  The  compiler  is  implemented  in  the  Coq 
Proof  Assistant  and  it  is  proved  that  crucial  properties  of  event  traces 
are  preserved  during  compilation.  A  novel  quantitative  Hoare  logic  is 
developed  to  verify  stack-space  bounds  at  the  CompCert  Clight  level. 
The  quantitative  logic  is  implemented  in  Coq  and  proved  sound  with 
respect  to  event  traces  generated  by  the  small-step  semantics  of 
CompCert  Clight.  Stack-space  bounds  can  be  proved  at  the  source 
level  without  taking  into  account  low-level  details  that  depend  on 
the  implementation  of  the  compiler.  The  compiler  fills  in  these 
low-level  details  during  compilation  and  generates  a  concrete  stack- 
space  bound  that  applies  to  the  produced  machine  code.  The  verified 
stack  analyzer  is  guaranteed  to  automatically  derive  bounds  for 
code  with  non-recursive  functions.  It  generates  a  derivation  in  the 
quantitative  logic  to  ensure  soundness  as  well  as  interoperability 
with  interactively  developed  stack  bounds. 

In  an  experimental  evaluation,  the  developed  framework  is 
used  to  obtain  verified  stack-space  bounds  for  micro  benchmarks 
as  well  as  real  system  code.  The  examples  include  the  verified 
operating-system  kernel  CertiKOS,  parts  of  the  MiBench  embedded 
benchmark  suite,  and  programs  from  the  CompCert  benchmarks. 
The  derived  bounds  are  close  to  the  measured  stack-space  usage  of 
executions  of  the  compiled  programs  on  a  Linux  x86  system. 

Categories  and  Subject  Descriptors  D.2.4  [Software  Engineer¬ 
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1.  Introduction 

It  has  been  shown  that  formal  verification  can  greatly  improve 
software  quality  [25,  33,  35].  Consequently,  formal  verification  is 
extensively  studied  in  ongoing  research  and  there  exist  sophisticated 
tools  that  can  verify  important  program  properties  automatically. 
However,  the  most  interesting  program  properties  are  undecidable 
and  user  interaction  is  therefore  inevitable  in  formal  verification. 

If  a  software  system  is  (partly  or  entirely)  developed  in  a  high- 
level  language  then  the  question  arises  on  which  language  level 
the  verification  should  be  carried  out.  Formal  verification  at  the 
source  level  has  the  advantage  that  a  developer  can  interact  with  the 
verification  tools  using  the  code  she  has  developed.  This  is  beneficial 
because  the  compiled  code  can  substantially  differ  from  the  source 
code  and  low-level  code  is  harder  to  understand.  Moreover,  even 
fully  automatic  tools  profit  from  the  control-flow  information  and 
the  structure  that  is  available  at  higher  abstraction  layers.  The 
disadvantage  of  verification  at  the  source  level  is  that  tools  such  as 
compilers  have  to  be  part  of  the  trusted  computing  base  and  that  the 
verified  properties  are  not  directly  guaranteed  for  the  code  that  is 
executed  on  the  system. 

Formally  verified  compilers  [11,  24]  such  as  the  CompCert  C 
Compiler  [27]  guarantee  that  certain  program  properties  of  the 
source  programs  are  preserved  during  compilation.  As  a  result, 
CompCert  enables  source-level  verification  of  the  preserved  proper¬ 
ties  of  the  compiled  code  without  increasing  the  size  of  the  trusted 
computing  base.'  In  fact,  this  has  been  one  of  the  main  motiva¬ 
tions  for  the  development  of  CompCert  [27].  However,  important 
quantitative  properties  such  as  memory  and  time  consumption  are 
not  modeled  nor  preserved  by  CompCert  and  other  verified  compil¬ 
ers  [11,  24].  Such  quantitative  properties  are  nevertheless  crucial  in 
the  verification  of  safety-critical  embedded  systems.  For  example, 
the  DO-178C  standard,  which  is  used  by  in  the  avionics  industry 
and  by  regulatory  authorities,  requires  verification  activities  to  show 
that  a  program  in  executable  form  complies  with  its  requirements 
on  stack  usage  and  worst-case  execution  time  (WCET)  [28]. 

Quantitative  program  requirements  such  as  stack  usage  and 
WCET  are  usually  directly  checked  at  the  machine  or  assembly- 
code  level  “since  only  at  this  level  is  all  necessary  information  avail¬ 
able”  [34].  For  stack-space  bounds  there  exist  commercial  abstract 
interpretation-based  tools — such  as  Absint’s  StackAnalyzer  [14] — 
that  operate  directly  on  machine  code.  While  such  tools  can  derive 
many  simple  bounds  automatically,  they  rely  on  user  annotations  in 
the  machine  code  to  obtain  bounds  for  more  involved  programs.  The 
produced  bounds  are  usually  not  parametric  in  the  input,  and  the 
analysis  is  not  modular  and  only  applies  to  specific  hardware  plat¬ 
forms.  Additionally,  the  used  analysis  tools  rely  on  the  correctness 
of  the  user  annotations  and  are  not  formally  verified. 

In  this  article,  we  present  the  first  framework  for  deriving 
formally  verified  end-to-end  stack-space  bounds  for  C  programs. 
Stack  bounds  are  particularly  interesting  because  stack  overflow 


*  If  we  assume  that  all  verification  is  carried  out  using  the  same  trusted  base. 
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is  “one  of  the  toughest  (and  unfortunately  common)  problems 
in  embedded  systems”  [13].  Moreover,  stack-memory  is  the  only 
dynamically  allocated  memory  in  many  embedded  systems  and  the 
stack  usage  depends  on  the  implementation  of  the  compiler.  While 
we  focus  exclusively  on  stack  bounds  in  this  article,  our  framework 
is  developed  with  other  quantitative  resources  in  mind.  Many  of  the 
developed  techniques  can  be  applied  to  derive  bounds  for  resources 
such  as  heap  memory  or  clock  cycles.  However,  for  clock-cycle 
bounds  there  is  a  lot  of  additional  work  to  be  done  that  is  beyond  the 
scope  of  this  article  (e.g.,  developing  a  formal  model  for  hardware 
caches  and  instruction  pipelines). 

The  main  innovation  of  our  framework  is  that  it  enables  the 
formal  verification  of  stack  bounds  for  compiled  x86  assembly 
code  at  the  C  level.  To  gain  the  benefits  of  source-level  verification 
without  the  entailed  disadvantages,  we  have  to  deal  with  three  main 
challenges. 

1 .  We  have  to  model  the  stack  consumption  of  programs  at  the  C 
level  and  we  have  to  formally  prove  that  our  model  is  consistent 
with  the  stack  consumption  of  the  compiled  code. 

2.  We  have  to  design  and  implement  a  C-level  verification  mecha¬ 
nism  that  allows  users  to  derive  parametric  stack-usage  bounds 
in  an  interactive  and  flexible  way. 

3.  We  have  to  minimize  user  interaction  during  the  verification  to 
enable  the  verification  of  large  systems. 

To  meet  Challenge  1,  we  use  event  traces  and  verified  compilation. 
Our  starting  point  is  the  CompCert  C  Compiler.  It  relies  on  event 
traces  to  prove  that  a  compiled  program  is  a  refinement  of  the  source 
program.  We  extend  event  traces  with  events  for  function  calls  and 
returns  and  define  a  weight  for  event  traces.  The  weight  describes 
the  stack-space  consumption  of  one  program  execution  as  a  function 
of  a  cost  metric  that  assigns  a  cost  to  individual  call  and  return 
events.  The  idea  is  that  a  user  or  an  (semi)  automatic  analysis  tool 
derives  bounds  on  the  weights  of  event  traces  that  depend  on  the 
stack-frame  sizes  of  the  program  functions.  During  compilation  the 
compiler  produces  a  specific  cost  metric  that  guarantees  that  the 
weight  of  an  event  trace  under  this  metric  is  an  upper  bound  on 
the  stack-space  usage  of  the  compiled  assembly  program  which 
produces  this  trace.  As  a  result,  we  derive  a  verified  upper  bound 
if  we  instantiate  the  derived  memory  bound  with  the  cost  metric 
produced  by  the  compiler. 

We  implemented  the  extended  event  traces  for  full  CompCert  C 
and  all  intermediate  languages  down  to  x86  assembly  in  Coq.  We 
extended  CompCert’s  soundness  theorem  to  take  into  account  the 
weights  of  traces.  In  addition  to  CompCert’s  refinement  theorem 
for  the  original  event  traces,  we  prove  that  compiled  programs 
produce  extended  event  traces  whose  weights  are  less  than  or  equal 
to  the  weights  of  the  traces  at  the  source  level.  This  means  that  we 
allow  reordering  or  deletion  of  call  and  return  events  as  long  as  the 
weight  of  the  trace  is  reduced  or  unchanged.  To  relate  the  weight 
of  traces  to  the  execution  on  a  system  with  finite  stack  space,  we 
modified  the  CompCert  x86  assembly  semantics  into  a  more  realistic 
x86  assembly  that  features  a  finite  stack,  and  reimplemented  the 
assembly  generation  pass  of  CompCert  to  our  new  x86  assembly 
semantics. 

To  meet  Challenge  2,  we  have  developed  and  implemented  a  novel 
quantitative  Hoare  logic  for  CompCert  Clight  in  Coq.  To  account  for 
memory  consumption,  the  assertions  of  the  logic  generalize  the  usual 
boolean- valued  assertions  of  Hoare  logic.  Instead  of  the  classic  true, 
our  quantitative  assertions  return  a  natural  number  that  indicates 
the  amount  of  memory  that  is  needed  to  execute  the  program.  The 
boolean /flZie  is  represented  by  oo  and  indicates  that  there  are  no 
guarantees  provided  for  the  future  execution. 

We  proved  the  soundness  of  our  quantitative  Hoare  logic  with 
respect  to  Clight  and  CompCert’s  continuation-based  small-step 


semantics.  The  soundness  theorem  states  that  Hoare  triples  that 
are  derived  with  our  inference  rules  describe  sound  bounds  on  the 
weights  of  traces.  The  logic  can  be  used  for  interactive  stack-bound 
development  or  as  a  backend  for  verified  static  analysis  tools.  For 
clarity,  we  do  not  prove  the  safety  of  programs  and  simply  assume 
that  this  is  done  using  a  different  tool  such  as  Appel’s  separation 
logic  for  Clight  [3].  It  would  be  possible  to  integrate  our  logic  into 
a  separation  logic  for  safety  proofs.  This  would  however  diminish 
the  deployability  of  the  quantitative  logic  as  a  backend  for  static 
stack-bound  analysis  tools  since  they  would  be  required  to  also 
prove  memory  safety. 

To  meet  Challenge  3,  we  implemented  an  automatic  stack  ana¬ 
lyzer  for  C  programs.  To  verify  the  soundness  of  the  stack  analyzer 
each  successful  run  generates  a  derivation  in  the  quantitative  Hoare 
logic.  Not  only  does  this  simplify  the  verification,  but  it  also  al¬ 
lows  interoperability  with  stack  bounds  that  have  been  interactively 
developed  in  the  logic  or  derived  by  some  other  static  analysis.  Con¬ 
ceptually,  our  stack  analyzer  is  rather  simple  but  we  have  proved 
that  it  derives  bounds  for  all  programs  without  recursion  and  func¬ 
tion  pointers.  This  is  already  sufficient  for  many  programs  that  are 
used  in  embedded  systems.  Using  our  automatic  analysis  we  have 
created  a  verified  C  compiler  that  translates  a  program  without  func¬ 
tion  pointers  and  recursive  calls  to  x86  assembly  and  automatically 
derives  a  stack  bound  for  each  function  in  the  program  including 
main(). 

We  have  successfully  used  our  framework  to  verify  end-to-end 
memory  bounds  for  micro  benchmarks  and  system  software.  Our 
main  example  is  the  CertiKOS  [15]  operating  system  kernel  that  is 
currently  under  development  at  Yale.  Our  automatic  analyzer  finds 
stack  bounds  for  all  functions  in  the  simplified  development  version 
of  CertiKOS  that  is  currently  verified.  Other  examples  are  taken 
from  Leroy’s  CompCert  benchmarks  and  the  MiBench  embedded 
benchmark  suite  [17].  To  evaluate  the  quality  of  the  verified  stack- 
space  bounds,  we  experimentally  compared  the  automatically  and 
manually  verified  bounds  with  the  actual  stack-space  consumption 
during  the  execution  of  the  compiled  C  programs.  Our  experiments 
indicate  that  both  the  manually  and  automatically  derived  bounds 
over-approximate  the  stack  usage  by  exactly  four  bytes.  More  details 
can  be  found  in  Section  6. 

In  summary,  we  make  the  following  contributions. 

•  We  introduce  a  methodology  that  uses  cost  metrics  to  link  event 
traces  to  resource  consumption.  This  approach  enables  us  to 
link  source-level  code  to  the  resource  consumption  of  compiled 
target-level  code. 

•  We  develop  a  novel  quantitative  Hoare  logic  to  reason  about 
the  resource  consumption  of  programs  at  the  source  level.  We 
have  formally  verified  the  soundness  of  the  logic  with  respect  to 
CompCert  Clight  in  Coq. 

•  We  introduce  Quantitative  CompCert,  a  modified  version  of 
the  verified  CompCert  C  Compiler,  in  which  parametric  stack 
bounds  are  preserved  during  compilation.  Furthermore,  Quanti¬ 
tative  CompCert  creates  a  cost  metric  so  that  the  instantiation  of 
the  bounds  with  the  metric  forms  an  upper  bound  on  the  memory 
consumption  of  the  compiled  code. 

•  We  have  implemented  and  verified  an  automatic  stack  analyzer. 

•  We  have  evaluated  the  practicability  of  our  framework  with 
experiments  using  micro  benchmarks  and  system  code. 

The  complete  Coq  development  and  the  implemented  tools  are  well 
documented  and  publically  available  on  the  authors’  websites.  The 
PLDI  Artifact  Evaluation  Committee  reproduced  samples  of  our  ex¬ 
periments  and  tested  the  implemented  tools  on  additional  programs. 
The  reviewers  unanimously  stated  that  our  implementation  exceeded 
their  expectations.  A  companion  technical  report  [9]  contains  addi¬ 
tional  explanation,  lemmas,  and  examples. 
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2.  An  Illustrative  Example 

In  this  section,  we  sketch  the  verification  of  stack-space  bounds  for 
an  example  program  in  our  framework.  Figure  1  shows  a  C  program 
with  two  integer  parameters:  ALEN  and  SEED. 

This  program  will  fill  an  array  of  size  ALEN  with  an  increasing 
sequence  of  pseudo  random  integers  and  search  through  it.  The 
random  numbers  are  created  by  a  linear  congruential  generator 
initialized  by  the  SEED  parameter.  The  search  procedure  used  is  a 
binary  search  implemented  in  the  recursive  function  search. 

Our  goal  is  to  derive  stack  bounds  for  the  compiled  x86  assembly 
code  of  the  program  that  are  verified  with  respect  to  our  accurate 
x86  model  in  Coq.  The  first  step  is  to  create  an  abstract  syntax  tree 
of  the  code  in  Coq.  This  can  be  done  automatically,  for  instance  by 
using  CompCert’s  parsing  mechanism.  The  second  step  is  to  use  our 
quantitative  Hoare  logic  to  prove  bounds  on  the  function  calls  that 
are  performed  when  executing  main. 

To  relate  function  calls  and  returns  at  different  abstraction  levels 
during  compilation  we  use  call  and  return  events.  For  instance,  an 
execution  of  main  could  produce  the  following  trace. 

call(main),  call(init),  call(random),  ret(random),  ret(init), 
call(search),  call(search),  ret(search),  ret(search),  ret(main) 

From  such  a  trace  and  a  metric  M  that  maps  each  function  name  in 
the  program  to  its  stack-frame  size,  we  can  obtain  the  stack  usage 
of  the  execution  that  produced  the  trace.  For  the  previous  example 
trace,  we  can  for  instance  derive  the  following  stack  usage. 

M(main)  -I-  max{M(init)  -I-  M(random),  2  •  M(search)} 

In  classical  Hoare  logic,  assertions  map  program  states  to  Booleans. 
In  our  quantitative  Hoare  logic  assertions  map  program  states  to 
non-negative  numbers.  Intuitively,  the  meaning  of  a  quantitative 
Hoare  triple  {P}  S  {Q}  is  the  following.  For  every  program  state  a, 
P(ct)  is  an  upper  bound  on  the  stack  consumption  of  the  statement 
S  started  in  state  a.  Furthermore,  Q  describes  the  stack  space  that 
has  become  available  after  the  execution,  as  a  function  of  the  final 
program  state.  This  is  similar  to  type  systems  and  program  logics 
for  amortized  resource  analysis  [5,  21]. 

We  implemented  a  function  in  Coq  that  automatically  computes 
a  derivation  in  the  quantitative  logic  for  a  program  without  recur¬ 
sive  functions.  Using  this  automatic  stack  analyzer,  we  derive  for 
instance  the  following  triple  for  the  function  call  init(). 

{M(init)  -I-  M(random)}  init()  {M(init)  -I-  M(random)} 

For  functions  making  use  of  recursion  such  as  search,  we  derive  a 
quantitative  triple  interactively  using  Coq.  For  search  we  derive 

{L(end  —  beg)}  search(elem,  beg,  end)  {L(end  —  beg)} 

where  L(A)  =  M(search)  ■  (2-l-log2(A))  .  Since  the  mathematical 
logj  function  is  undefined  on  non-positive  values,  we  take  as 
convention  that  log2(A)  =  -l-oo  when  A  <  0  and  log2(0)  =  0. 
This  trick  allows  us  to  simulate  a  logical  precondition  stating  that 
beg  must  be  lower  or  equal  to  end  before  calling  search. 

For  main  we  combine  the  previous  results  and  derive  the  bound 

{M(main)  -I-  N}  main()  {M(main)  -I-  A^} 

where  N  =  max(M(init)  -I-  M(random),  L(ALEN)).  To  be  able 
to  derive  this  bound  on  the  main  function  we  have  to  require  that 
0  <  ALEN  sg  2®^  —  1,  in  the  Coq  development  this  is  stated  as  a 
section  hypothesis  which  will  later  be  instantiated  when  ALEN  is 
chosen  by  the  user  before  compiling. 

The  third  and  final  step  in  the  derivation  of  the  stack  bounds 
is  to  compile  the  program  with  Quantitative  CompCert,  our  mod¬ 
ified  CompCert  C  Compiler.  The  compiler  produces  x86  assem¬ 
bly  code  and  a  concrete  metric  Mq.  It  follows  from  CompCert’s 
correctness  theorem  that  the  compiled  code  is  a  semantic  refine- 


typedef  unsigned  int  u32; 
u32  a [ALEN] ; 
u32  seed  =  SEED; 

u32  search (u32  elem,  u32  beg,  u32  end)  { 
u32  mid  =  beg  +  (end-beg)  /  2; 
if  (end-beg  <=  1)  return  beg; 
if  (a [mid]  >  elem)  end  =  mid; 
else  beg  =  mid; 

return  search(elem,  beg,  end); 

} 

u32  random O  { 

seed  =  (seed  *  1664525)  +  1013904223; 
return  seed; 

} 

void  initO  { 

u32  i,  rnd,  prev  =  0; 

for  (i=0;  i<ALEN;  i++)  { 
rnd  =  random 0 ; 
a[i]  =  prev  +  rnd  "/,  17; 
prev  =  a[i]  ;  } 

} 

int  mainO  { 

u32  idx,  elem; 
init  0  ; 

elem  =  randomO  "/,  (17  *  ALEN); 
idx  =  search(elem,  0,  ALEN); 
return  a [idx]  ==  elem; 

} 


Figure  1.  An  illustrative  example  for  static  stack-bound  computa¬ 
tion.  Constant  stack  bounds  for  the  non-recursive  functions  are  de¬ 
rived  automatically.  The  logarithmic  bound  for  the  function  search 
is  derived  with  a  hand-crafted  proof  in  our  quantitative  Hoare  logic. 

ment  of  our  source  program.  In  addition,  we  have  formally  verified 
that  the  metric  Mq  correctly  relates  the  abstractly  defined  stack 
consumption — using  the  event  traces — to  the  actual  stack  consump¬ 
tion  in  our  abstract  x86  machine.  Moreover,  we  have  verified  that 
applying  Mq  to  the  preconditions  in  the  triples  of  the  quantitative 
Hoare  logic  results  in  sound  stack  bounds  on  the  x86  machine.  The 
final  bounds  that  we  obtain  for  our  examples  are  for  instance  32 
bytes  for  init()  and  112  -I-  40  •  log2(ALEN)  bytes  for  main(). 

3.  Quantitative  CompCert:  Verified  Stack-Aware 
Compilation 

In  this  section,  we  introduce  our  new  technique  for  verifying  quanti¬ 
tative  compiler  correctness  and  its  implementation  in  Quantitative 
CompCert.  We  focus  on  stack-space  usage  but  believe  that  similar 
techniques  can  be  used  to  bound  the  time  and  heap-space  require¬ 
ments  of  programs.  Our  development  is  highly  influenced  by  the 
design  of  CompCert  [27],  a  verified  compiler  for  the  C  language. 
CompCert  C  accepts  most  of  the  ISO-C-90  language  and  produces 
machine  code  for  the  IA32  architecture  (among  others).  CompCert 
uses  1 1  intermediate  languages  and  20  passes  to  compile  a  C  AST 
to  x86  assembly. 

The  soundness  proof  of  CompCert  is  based  on  trace-based  oper¬ 
ational  semantics  for  the  source,  target,  and  intermediate  languages. 
These  semantics  generate  traces  of  events  during  the  execution  of 
programs.  Events  include  input/output  and  external  function  calls. 
The  soundness  theorem  of  CompCert  states  that  every  event  trace 
that  can  be  generated  by  the  compiled  program  can  also  be  generated 
by  the  source  program  provided  that  the  source  program  does  not 
go  wrong.  In  other  words,  the  compiled  program  is  a  refinement  of 
the  source  program  with  respect  to  the  observable  events. 
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Figure  2.  Overview  of  our  quantitative  verification  framework.  We  write  Wm[s)  =  svlp{Wm{B)  |  B  e  |s]]}  for  the  weight  of  the  program 
s  under  the  metric  M.  We  write  stack(s)  for  the  smallest  number  n  so  that  s  runs  without  stack  overflow  if  executed  with  a  stack  of  size  n. 


3.1  Quantitative  Compiler  Correctness 

In  the  following,  we  show  how  to  extend  trace-based  compiler- 
correctness  proofs  to  also  cover  stack-space  consumption.  In  short, 
our  technique  works  as  follows. 

1 .  We  generate  events  for  all  semantic  actions  that  are  relevant  for 
stack-space  usage,  that  is,  function  calls  and  returns. 

2.  We  define  a  weight  function  for  event  traces  that  describes  the 
stack-space  consumption  of  program  executions  that  produce 
that  trace.  The  weight  of  an  event  trace  is  parameterized  by  a 
resource  metric  that  describes  the  cost  of  each  event. 

3.  We  formally  verify  that  for  all  resource  metrics  and  for  all 
event  traces  produced  by  a  target  program,  the  source  program 
either  goes  wrong  or  produces  an  equivalent  (see  the  following 
definition)  event  trace  with  a  greater  or  equal  weight. 

4.  During  compilation,  we  produce  a  cost  metric  that  accurately 
describes  the  memory  consumption  of  target  programs:  If  an 
execution  of  a  target  program  produces  an  event  trace  of  weight  n 
under  the  produced  metric  then  this  execution  can  be  performed 
on  a  system  with  stack  size  n. 

We  now  formalize  and  elaborate  on  these  points. 

Event  Traces  In  CompCert,  the  observable  events  are  external 
function  calls  (e.g.,  I/O  events)  that  are  represented  by  function 
identifiers  together  with  a  list  of  input  values  and  an  output  value 
as  given  by  the  following  grammar.  To  track  stack  usage,  we  add 
memory  events  for  internal  function  calls  and  returns.  In  contrast 
to  I/O  events,  memory  events  do  not  have  to  be  preserved  during 
compilation. 

Event  values  v  ::=  int(n)  |  float(g) 

I/O  events  v  ::=  f(v  v) 

Memory  events  ::=  call(2:)  |  ret(a:) 

Event  traces  are  defined  in  a  similar  way  to  CompCert.  We  distin¬ 
guish  finite  (inductive)  traces  t  and  possibly  infinite  (coinductive) 
traces  T.  A  program  behavior  is  either  a  converging  computation 
conv(t,  n)  producing  a  finite  event  trace  t  and  a  return  code  n,  a 
diverging  computation  div(r)  producing  a  finite  or  infinite  trace 
T,  or  a  computation  fail(t)  that  goes  wrong  and  produces  the  finite 
trace  t. 

Einite  event  traces  t  ::=  e  \  v  ■  t  \  /j,  ■  t 

Coinductive  event  traces  T  :\=  e  \  v  ■  T  \  ^  ■  T 

Behaviors  B  ::=  conv(/,  n)  |  div(T')  |  fail(/) 

We  write  £  for  the  set  of  memory  and  I/O  events,  B  for  the  set  of 
behaviors,  and  T  for  the  set  of  traces. 


Weights  of  Behaviors  For  a  behavior  B,  we  define  the  set  of  finite 
prefix  traces  prefs{B)  of  B  as  follows. 

pref s {com {t,n))  =  {ti  \  t  =  ti  ■  12} 
pre/s(div(T))  =  {t  \  T  =  t  ■  T'} 
prefs{fa\\{t))  =  {h  \  t  =  ti  ■  12} 

The  weight  Wm{B)  g  N  u  {00}  of  a  behavior  B  describes  the 
number  of  bytes  that  are  needed  in  an  execution  that  produces  B.  It 
is  parameterized  by  a  resource  metric  M  :  f  — >  Z  that  maps  events 
to  integers  (bytes).  The  purpose  of  the  metric  in  our  work  is  to  relate 
memory  events  to  the  sizes  of  the  stack  frames  of  functions  in  the 
target  code.  To  this  end,  we  only  use  stack  metrics,  that  is,  metrics 
M  such  that  for  all  functions  /  and  for  all  external  functions  g 

0  ^  M(call(/))  =  — M(ret(/))  and  M{g{v  v))  =  0  . 

In  the  Coq  implementation  of  our  compiler,  we  can  also  deal  with 
nonzero  stack  consumption  for  external  functions  as  long  as  the 
stack  consumption  of  each  call  is  bounded  by  a  constant. 

Before  we  define  the  weight,  we  first  inductively  define  the 
valuation  (/)  of  a  finite  trace  t. 

Vm{£)  =  0  and  VM(a  •  t)  =  VM{t)  +  M{a) 

We  now  define  the  weight  Wm{T)  of  a  potentially  infinite  trace 
T  and  the  weight  Wm{B)  of  a  behavior  B  under  the  metric  M  as 
follows: 

Wm{T)  =  sup{yM(/)  I  T  =  i  •  T'} 

Wm{B)  =  sup{VM(i)  I  t  e  prefs{B)} 

Quantitative  Refinement  For  our  description  of  quantitative  re¬ 
finements  we  leave  the  definition  of  programs  abstract.  A  program 
s  e  "P  is  simply  an  object  that  is  associated,  through  a  function 
J-]  :  "P  — >  B,  with  a  set  of  behaviors  Js]  G  B.  An  execution  of  a 
program  can  produce  different  traces,  either  due  to  non-determinism 
in  the  semantics  or  due  to  user  inputs  recorded  in  the  event  traces. 

For  a  behavior  B  we  define  the  pruned  behavior  as  the  behavior 
B  that  results  from  deleting  all  memory  events  (call(2:)  or  ret(x)) 
from  B.  The  formal  definition  can  be  found  in  the  TR  [9]. 

In  CompCert,  compiler  correctness  is  formalized  through  the 
notion  of  refinement.  A  (target)  program  s'  is  a  refinement  of  a 
(source)  program  s,  written  s'  <  s,  if  for  every  behavior  B'  e  |[s'| 
there  is  B  G  Js]  such  that  B  =  B'  or  fail(i)  G  |[s|  for  some  trace 
Note  that  memory  events  are  not  taken  into  account  in  CompCert’s 
classic  definition  of  refinement. 


^  In  fact,  it  is  enough  to  prove  that  B'  ~  B  (bisimilarity  of  infinite  traces), 
because  [s]  is  closed  by  bisimilarity. 
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Figure  3.  Our  modified  stack-aware  CompCert  C  compiler.  We  replace  CompCert’s  x86  assembly  with  the  more  realistic  x86  assembly 
semantics  ASM32  with  finite  stack.  Pseudo  assembly  instructions  such  as  Pallocframe  and  Pfreeframe  are  not  needed  anymore. 


To  also  relate  the  memory  events  in  the  behaviors  of  two 
programs,  we  define  a  novel  quantitative  refinement.  A  (target) 
program  s'  is  a  quantitative  refinement  of  a  (source)  program  s, 
written  s'  <q  s  if  the  following  holds.  For  every  behavior  B'  e  |[s'| 
there  exists  B  e  |[s|  such  that  B  =  B'  and  Wm{B')  ^  Wm{B) 
for  all  stack  metrics  M,  or  fail(f)  e  Js]  for  some  trace  t.  In 
Quantitative  CompCert,  our  modified  CompCert  compiler,  we  prove 
for  each  compiler  pass  C  that  C(s)  <q  s  for  every  program  s. 

Verifying  Stack-Space  Usage  Figure  2  summarizes  how  we  verify 
the  stack-space  usage  of  a  program  in  our  framework.  First,  we 
prove  a  bound  /3  :  (f  — >  Z)  ^  N  on  the  weights  of  the  event 
traces  that  a  program  can  produce.  This  bound  is  parameterized  by 
an  event  metric  M  :  £■  — >  Z.  Second,  our  verified  compiler — thanks 
to  quantitative  refinement — ensures  that  the  computed  bound  also 
holds  for  the  weights  of  the  traces  of  the  compiled  program. 

Third,  we  have  to  relate  the  computed  bound  to  the  actual  stack 
usage  of  the  compiled  code.  Therefore,  our  compiler  computes  not 
only  a  target  program  C{s)  but  also  a  metric  Ms  such  that  C(s) 
can  be  safely  executed  with  a  stack-memory  size  of  supIITm,  (B)  \ 
B  e  |[C'(s)]] }  bytes.  As  a  result,  the  initially  derived  bound  for  the 
source  code  can  be  instantiated  with  the  metric  Ms  to  obtain  the 
wanted  stack-space  bound  Ms(/3)  for  the  target  program. 

In  this  overview  picture,  we  assume  that  the  semantics  of  the 
target  and  source  languages  are  both  formulated  with  an  unbounded 
stack.  The  final  step  of  the  soundness  proof  (not  illustrated  in 
Figure  2)  is  to  relate  the  trace-based  semantics  of  the  target  language 
to  a  realistic  assembly  semantics  in  which  the  program  is  executed 
with  a  fixed  stack  size.  To  this  end,  we  prove  that  an  execution 
of  C{s)  with  bounded  stack  space  sup{IFms(B)  |  B  e  |[C'(s)|} 
is  a  refinement  of  the  execution  of  C{s)  in  the  semantics  with 
unbounded  stack  (see  explanation  in  Section  3.2). 

3.2  Verification  and  Implementation 

We  implemented  the  verification  framework  that  we  outlined  in 
Section  3.1  for  the  CompCert  C  compiler  using  the  proof  assistant 
Coq.  The  verification  consists  of  about  5000  lines  of  Coq  code  that 
we  integrated  into  CompCert  1.13  (which  originally  consists  of 
about  90000  lines  of  Coq  code)  to  obtain  a  modified  version  that  we 
call  Quantitative  CompCert.  CompCert  1.13  is  decomposed  into  20 
passes  between  1 1  intermediate  languages  (see  [9]  for  an  overview). 
We  describe  our  modified  Quantitative  CompCert  in  this  section. 

The  problem:  stack  consumption  in  CompCert  For  each  interme¬ 
diate  language  of  CompCert  (beyond  C  subsets),  each  function  call 
allocates  a  memory  region — called  the  stack  frame — to  store  its 
addressable  local  variables,  and  later  the  spilling  locations  and  the 
function  arguments  to  handle  the  calling  conventions.  This  stack 
frame  is  freed  upon  function  return.  However,  even  though  each 
stack  frame  is  finite,  there  may  well  be  an  unbounded  number  of 
such  allocations,  even  for  nested  function  calls.  Indeed,  in  CompCert, 
allocating  a  stack  frame  always  succeeds,  thus  CompCert  does  not 
model  stack  overflow. 

Our  solution:  Quantitative  CompCert  In  Quantitative  CompCert, 
we  overcome  this  issue  by  modifying  the  semantics  of  the  target 
assembly  language.  We  preallocate  a  finite  memory  region  for  the 


whole  stack,  into  which  all  stack  frames  shall  be  merged  together 
during  the  execution  instead  of  being  individually  allocated. 

By  contrast,  we  still  want  the  source  and  intermediate  languages 
to  allocate  an  individual  stack  frame  per  function  call.  First,  we 
want  to  change  CompCert  only  if  necessary  so  as  to  still  support  all 
features  of  CompCert  C.  Second,  it  would  not  be  very  meaningful 
to  introduce  a  finite  stack  at  a  high  language  level  since  it  is  unclear 
how  to  model  stack  sizes.  The  only  major  change  we  bring  to  those 
languages  is  to  introduce  our  call  and  return  events  into  the  trace. 

As  shown  in  Figure  3,  this  leads  us  to  split  CompCert  into 
two  parts.  In  the  first  part,  we  compile  CompCert  C  down  to 
the  CompCert  Mach  low-level  language  (which  comes  just  before 
assembly  generation)  by  adapting  the  proofs  of  existing  passes  to 
quantitative  refinement.  In  the  second  part,  we  perform  two  passes 
to  merge  all  stack  frames  together.  The  key  point  of  our  work  is  that 
this  second  part  will  require  the  Mach  traces  to  not  stack  overflow, 
which  justifies  the  use  of  quantitative  refinement  for  the  first  part. 

Quantitative  Refinement  In  the  first  part  of  the  compiler,  from 
CompCert  C  down  to  Mach,  we  add  call  and  return  events  to  the 
semantics  of  each  language,  at  the  level  of  each  function  call  and 
return  (as  described  in  Section  3.1).  This  change  is  uniform  in  all 
languages  between  CompCert  C  to  Mach:  indeed,  in  each  small-step 
operational  semantics,  the  rules  responsible  for  internal  function 
call  and  return  all  have  the  same  shape. 

Then,  thanks  to  these  changes,  we  support  all  of  CompCert  1.13 
passes  except  two  optional  optimizations  (see  Section  3.3),  and, 
with  no  significant  changes  to  the  proofs,  we  prove  that  they  exactly 
preserve  traces  with  function  call  events. 

Generation  of  Target  Cost  Metric  The  semantics  of  CompCert 
C  allocates  a  separate  memory  region  for  each  addressable  local 
variable.  In  Mach,  all  those  variables  as  well  as  the  spilling  locations, 
the  function  arguments,  and  the  return  address  are  stored  in  a 
stack  frame.  Actually,  the  stack  frame  of  a  Mach  function  call  is 
completely  laid  out,  so  that  no  additional  memory  is  necessary  when 
generating  the  CompCert  x86  assembly  code.  This  means  that,  at 
the  level  of  Mach,  we  already  know  the  stack  size  necessary  for  a 
function  call  (thanks  to  the  fact  that  the  original  CompCert  does  not 
support  some  C  features,  see  3.3):  for  a  given  function,  this  size  is 
constant  and  does  not  depend  on  the  arguments  nor  the  input.  So,  we 
can  use  the  sizes  of  Mach  stack  frames  as  cost  metric  for  functions 
to  accurately  estimate  stack  bounds  at  the  source  level. 

Generation  of  Assembly  Code  Recall  that  CompCert  x86  assem¬ 
bly  language  is  not  realistic  enough  as  it  does  not  prevent  a  program 
from  allocating  an  infinite  number  of  stack  frames.  Our  goal,  as  one 
of  our  main  applications  of  our  quantitative  refinement,  is  to  make 
the  CompCert  x86  assembly  language  more  realistic  by  having  it 
model  a  contiguous  finite  stack  that  is  preallocated  at  the  beginning 
of  the  program.  The  semantics  of  our  new  CompCert  x86  assembly 
is  parameterized  by  the  size  52;  -I-  4  ^  of  the  whole  stack  (provided, 
in  most  cases,  by  the  host  operating  system).  We  call  this  new  x86 
semantics  ASM^z.  We  design  it  in  such  a  way  that  an  execution  goes 


^  sz  is  the  stack  size  actually  consumed  by  the  program  starting  from  main, 
but  we  have  to  account  for  the  return  address  of  the  “caller”  of  main 
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wrong  if  the  program  tries  to  access  more  than  sz  bytes  of  stack.  In 
other  words,  stack  overflow  becomes  possible  in  ASM^z. 

Because  the  notion  of  function  call  is  no  longer  relevant  (there  is 
no  “control  stack”),  we  lose  the  ability  to  extend  this  semantics  with 
call  and  return  events.  So,  rather  than  quantitative  refinement,  we 
are  actually  interested  in  whether  a  CompCert  C  source  program  can 
run  on  ASM^z  without  going  wrong  because  of  stack  overflow.  The 
correctness  of  our  Quantitative  CompCert  compiler  is  formalized  by 
the  following  theorem. 

Theorem  1.  Let  sz  +  4  e  [4,  2^^'j  be  the  size  of  the  whole  target 
stack.  Consider  a  CompCert  C  source  program  S  and  assume  the 
following: 

1.  S  does  not  go  wrong  in  the  ordinary  setting  of  unbounded  stack 

space,  that  is,  e  JS]]. 

2.  Quantitative  CompCert  produces  a  Mach  intermediate  target 
code  I,  with  the  sizes  of  stack  frames*  SF  and  the  subsequent 
cost  metric  M{f)  =  SF(/)  +  4. 

3.  The  stack  bounds  of  S  inferred  at  the  source  level  are  lower  than 
sz  under  the  Match  cost  metric  M:  VS  e  [S’! ,  Wm  (B)  ^  sz. 

4.  From  I,  our  compiler  produces  a  target  assembly  code  T. 

Then,  when  run  in  ASM^z,  T  refines  S  in  the  sense  of  CompCert: 
MB'  G  |[T|sz,  G  |[5'|,  B'  =  B.  In  particular,  T  cannot  go  wrong 
and  thus  does  not  stack  overflow. 

It  is  important  to  first  prove  that  S  cannot  go  wrong  in  unbounded 
stack  space.  Indeed,  the  correctness  of  our  assembly  generation 
depends  on  the  fact  that  the  weights  of  Mach  traces  are  lower  than 
sz.  If  S'  were  to  have  a  wrong  behavior  fa i  1(f)  then  I  might  actually 
have  a  behavior  t  ■  B  whose  weight  could  well  exceed  sz  even 
though  II^M  (faiK^))  does  not.  As  each  pass  is  proved  independently 
of  the  others,  it  is  not  possible  to  track  the  behaviors  of  I  that  could 
potentially  come  from  wrong  behaviors  of  S. 

In  the  original  CompCert  x86  assembly  language,  the  notion 
of  stack  frame  is  still  kept,  so  that  this  language  has  two  pseudo¬ 
instructions  Pallocframe  and  Pfreeframe  responsible  of  allocating 
and  freeing  the  corresponding  memory  block,  even  though  those 
pseudo-instructions  are  then  turned  into  real  x86  assembly  instruc¬ 
tions  performing  pointer  arithmetics  with  the  ESP  stack  pointer 
register.  This  latter  transformation  cannot  be  proved  correct  in 
CompCert  because  pointer  arithmetics  cannot  cross  block  bound¬ 
aries  in  the  CompCert  memory  model.  Therefore  this  transformation 
is  done  in  an  unverified  “pretty-printing”  stage,  after  CompCert  has 
generated  the  x86  assembly  code  of  the  source  program. 

Our  new  assembly  semantics  overcomes  this  limitation.  Now, 
instead  of  allocating  different  memory  blocks,  we  preallocate  one 
single  block  of  size  sz  -I-  4  at  the  beginning  of  the  program  to 
hold  the  whole  stack,  and  our  assembly  generation  pass  ensures 
that  the  value  of  ESP  always  points  within  this  block.  Therefore 
the  pseudo-instructions  are  no  longer  necessary,  and  the  pointer 
arithmetics  needed  at  function  entry  and  exit  can  be  performed 
within  our  formalized  ASM^z  assembly  language. 

As  an  interesting  side  effect,  accessing  function  arguments  is 
now  simpler  in  our  assembly  language.  Indeed,  in  the  x86  calling 
convention,  a  function  has  to  look  for  its  arguments  in  the  stack 
frame  of  its  caller.  To  this  purpose,  the  original  CompCert  keeps  a 
back  pointer  to  link  each  stack  frame  to  its  parent.  Thanks  to  our 
changes,  function  arguments  can  now  be  provably  accessed  through 
pointer  arithmetics  with  no  indirection,  so  that  this  back  link  is  no 
longer  necessary.  Because  in  the  original  CompCert,  stack  frames 
are  independent  memory  blocks,  it  was  necessary  for  the  callee  to 
have  a  pointer  to  the  caller  stack  frame,  called  the  back  link,  in  its 


“^In  CompCert  Mach,  the  syntax  of  a  program  p  includes  a  finite  map  SF 
such  that,  for  any  function  /  defined  in  p,  the  operational  semantics  of  Mach 
allocates  a  stack  frame  of  SF(/)  bytes  whenever  /  is  entered. 


own  stack  frame.  The  callee  could  then  access  its  arguments  by 
one  indirection  through  this  back  link.  In  our  new  ASMjz  assembly 
language,  stack  frames  are  no  longer  independent,  so  that  the  callee 
can  access  its  arguments  directly  by  pointer  arithmetics  within  the 
whole  stack  block. 

3.3  Limitations 

Neither  the  original  CompCert  nor  Quantitative  CompCert  do 
support  variable  stack-frame  size:  C  features  such  as  variable- 
length  arrays  or  dynamic  stack  allocation  (alloca  special  library 
functions)  are  not  supported.  Thus,  the  size  of  the  stack  frame  of 
a  Mach  function  can  be  computed  statically,  and  can  be  used  to 
define  the  cost  metric  of  the  program.  Moreover,  the  subsequently 
produced  assembly  code  does  not  need  to  use  push  or  pop,  so  any 
change  to  the  stack  pointer  is  done  only  through  pointer  arithmetics. 

Quantitative  CompCert  currently  does  not  support  the  following 
optional  two  optimization  passes  (that  are  present  in  the  original 
CompCert):  tail-call  recognition  and  function  inlining.  We  describe 
how  to  deal  with  these  two  optimizations  in  the  companion  TR  [9] 
and  the  implementation  is  underway. 

4.  Quantitative  Hoare  Logic  for  CompCert  Clight 

In  this  section,  we  describe  the  novel  quantitative  program  logic 
for  CompCert  Clight.  The  logic  has  been  formalized  and  proved 
sound  using  Coq.  At  some  points,  we  simplify  the  presented  logic 
in  comparison  to  the  implemented  version  to  discuss  general  ideas 
instead  of  technical  details. 

Some  particularities  of  the  logic  can  be  better  understood  with 
respect  to  Clight  and  the  continuation-based  small-step  semantics 
for  Clight  programs  that  is  used  in  CompCert. 

4.1  CompCert  Clight 

CompCert  Clight  is  the  most  abstract  intermediate  language  used 
by  CompCert.  Mainly,  it  is  a  subset  of  C  in  which  loops  can  only 
be  exited  with  a  break  statement  and  expressions  are  free  of  side 
effects.  Using  Clight  instead  of  C  simplifies  the  definition  of  our 
quantitative  program  logic  and  is  also  in  line  with  the  design  of 
CompCert  and  the  verification  of  CertiKOS. 

Syntax  We  use  Clight  expressions  in  the  logic.  Our  statements’ 
syntax  is  a  subset  of  Clight’ s  to  focus  on  the  main  ideas  of  our  pro¬ 
gram  logic.  For  simplicity,  loops  are  infinite  unless  they  are  termi¬ 
nated  using  a  break  statement.  We  do  not  consider  function  pointers, 
goto  statements,  continue  statements,  and  switch  statements  (see 
Section  4.4). 

S,Si,S2  ■■■■=  skip  \x  =  E\x  =  f{E*)  I  Si;S2  |  loop 5 
I  if  {E)  then  Si  else  S2  \  break  |  return  E 

Like  in  C,  a  program  consists  of  a  list  of  global  variable  declara¬ 
tions,  a  list  of  function  declarations,  and  the  identifier  of  the  main 
statement,  which  is  the  entry  point  of  the  program. 

4.2  Operational  Semantics 

CompCert  Clight’s  semantics  is  based  on  small-step  transitions 
and  continuations.  Expressions — which  do  not  have  side  effects — 
are  evaluated  in  a  big-step  fashion.  We  use  a  simplified  version  of 
Clight’s  semantics  that  is  sufficient  for  our  subset.  It  is  easy  to  relate 
evaluations  in  our  simplified  version  to  evaluations  in  the  original 
semantics  and  we  have  implemented  a  verified  compiler  from  our 
simple  Clight  to  Clight  with  CompCert’s  original  semantics. 

Values  and  Memory  Model  A  value  is  either  an  integer  n  or  a 
memory  address  i. 

Val  ::=  int  n  \  adr  £ 

In  the  Coq  development  we  use  CompCert’s  memory  model.  How¬ 
ever,  the  main  ideas  of  the  logic  can  be  described  with  a  simple 
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r  h  {Q®}  skip  {Q}  (Q:Skip) 

r  h  {ils{i} 


r  h  {Q^}  break  {Q}  (Q:Break) 


r  h  {A(t  .  Q’’  I-EIct  )}  return  E  {Q}  (Q:Return) 


m.innpr  nf)  =  iPf,Qf)  P  =  X{e,H).Pfmte,H),H)  Q  =  He,  H) 

Th  {r}  loop  5  {(/,!,  r)}  ■  rp{P  +  M{f)}x  =  f{E){{Q  +  M{f),l,±)} 

^h{P}Sl{(i?,g^Q’')}  rh{P}S2{g}_^  ^  c^o  {P}^{g}_^  ^  p^p'  {p'}s{q'}  g'>g_^ 

- (Q:SEQ)  rn  ,  -r  o  (Q:FRAME)  - .  r>r  o  - (Q:ConSEQ) 


Th  {P}  S'i;5'2{g} 


{P  +  c}S'{g  +  c} 


{ns{g} 


Figure  4.  Selected  rules  of  the  quantitative  program  logic. 


memory  model  in  which  locations  are  mapped  to  values  and  labels. 
H  :  Mem  =  Loc  — >  Val  u  {■} 

The  label  ■  is  used  to  indicate  that  a  location  has  been  freed  and  can 
no  longer  be  used. 

Evaluating  Expressions  Expressions  are  evaluated  with  respect 
to  a  memory  H  :  Mem  and  two  environments 

e-.  VID  ^  Val  and  A  :  VID  ^  Loc  . 

The  local  environment  6  maps  local  variables  to  values  and  the 
global  environment  A  maps  global  variables  to  locations.  We 
assume  that  always  dom(A)  n  dom(6)  =  0. 

The  semantics  |[P|^  =  u  of  an  expression  E  under  a  global 

environment  A,  a  local  environment  9,  and  a  memory  H  is  defined 
by  induction  on  the  structure  of  E. 

Continuations  The  small-step  transition  relation  for  statements  is 
based  on  continuations.  Continuations  handle  the  local  control  flow 
within  a  function  as  well  as  the  logical  call  stack. 

K  :■.=  Kstop  I  Kseq  S  K  \  Kloop  SK  \  KcaWxfOK 

A  continuation  K  is  either  the  empty  continuation  Kstop,  a  sequence 
Kseq  S  K  ,a  loop  Kloop  S  K  ,oi  a  stack  frame  Kcall  x  f  6  K . 

Evaluating  Statements  Statements  are  evaluated  under  a  program 
state  {9,  H)  G  State  =  ( VID  — >  Val)  x  Mem  and  a  global 
environment 

(S,  A)  :  FID  [[VID]  X  S)  X  {VID  ^  Loc) 

that  maps  internal  functions  to  their  definitions — a  list  of  argument 
names  and  the  function  body — and  global  variables  to  values. 

The  small-step  evaluation  rules  are  given  in  the  companion 
TR  [9].  They  define  a  transition 

(S,A)  h  {S,K,a)  {S',K',o') 

where  /r  is  a  memory  event,  v  is  an  I/O  event,  e  denotes  no  event, 
S,  S'  are  statements,  K,  K'  are  continuations,  and  o,  o'  G  State. 

From  the  small-step  transition  relation  we  derive  the  following 
many-step  relation  in  which  i  is  a  finite  trace.  We  write 

(S,A)  h  (S'!, P'1, m)  (S„+i,P'„+i,a„+i) 
if  /  =  oi, . . . ,  a„  and  there  exists  {Si,Ki,  Oi)  such  that  for  all  i 
(E,A)  h  {Si,Ki,Oi)  -^ai  {Si+i,Ki+i,oi+i)  . 

For  a  statement  S  and  a  continuation  K,  we  define  the  weight 
under  the  global  environment  (S,  A),  the  program  state  o,  and 
the  metric  M  as  (E,A)  h  ,m){S,K)  =  supIVmW  I 
35",  K',  o',  t,  n  .  (E,  A)  h  (5,  K,  o)  {S',  K' ,  o')}  . 

4.3  Quantitative  Hoare  Logic 

In  the  following  we  describe  a  simplified  version  of  the  quantitative 
Hoare  logic  that  we  use  in  Coq  to  prove  bounds  on  the  weights 
of  the  traces  of  Clight  programs.  For  a  given  statement  5  and  a 
continuation  K,  our  goal  is  to  derive  a  bound  (E,  A)  I—  P{o,  M)  e 


N  such  that  (E,  A)  h  P{o,  M)  ^  (E,  A)  h  (5,  K)  for 

all  program  states  o  and  resource  metrics  M.  In  the  remainder  of 
this  section  we  assume  a  fixed  global  environment  (E,  A). 

We  generalize  classic  Hoare  logic  to  express  not  only  classical 
boolean-valued  assertions  but  also  assertions  that  talk  about  the  fu¬ 
ture  stack-space  usage.  Instead  of  the  usual  assertions  P  :  State  — ► 
bool  of  Hoare  logic  we  use  assertions 

P  :  State  — >  N  u  {oo}  . 

This  can  be  understood  as  a  refinement  of  boolean  assertions  where 
false  is  interpreted  by  oo  and  true  is  refined  by  N.  We  write 
Assn  for  State  — >  N  u  {oo},  and  L  =  (_  i— >  oo).  In  the  actual 
implementation,  assertions  have  the  type  State  — ►  N  — >  Prop.  For 
a  given  o  e  State,  such  an  assertion  can  be  seen  as  a  set  B  c  pj  of 
valid  bounds.  We  do  this  only  to  use  Coq’s  support  for  propositional 
reasoning.  The  presentation  here  is  easier  to  read. 

The  continuation-based  semantics  of  Clight  requires  that  we 
distinguish  pre-  and  postconditions  in  the  logic  to  account  for 
different  possible  ways  to  exit  a  block  of  code.  This  approach 
is  standard  in  Hoare  logics  and  followed  for  instance  in  Appel’s 
separation  logic  for  Clight  [3].  Our  postconditions 

g  =  (g*i  Q^)  ■  Assn  X  Assn  x  { Val  — >  Assn) 

provide  one  assertion  Q"  for  the  case  in  which  the  block  is  exited  by 
fall  through,  one  assertion  Q'’  if  the  block  is  exited  by  a  break,  and 
a  function  from  values  to  assertions  in  case  the  block  is  exited 
by  a  return.  The  function  Q'"  takes  the  return  value  as  argument. 

Since  we  have  to  deal  with  (possibly  recursive)  functions,  we 
also  need  a  function  context 

r:FID^{{  Val  x  Mem)^N  u  {oo})  x  ((  Val  x  Mem)^Nu{oo}) 

that  maps  function  names  to  their  specifications,  that  is,  pre-  and 
postconditions.  The  precondition  depends  on  the  value  that  is  passed 
to  the  function  by  the  caller  and  the  memory.  The  postcondition 
depends  on  the  return  value  and  the  memory.  We  assume  that  a  func¬ 
tion  has  only  one  argument  in  this  article.  In  the  Coq  implementation, 
an  arbitrary  number  of  function  arguments  is  allowed. 

In  summary,  a  quantitative  Hoare  triple  has  the  form  L  I— 
{P}  5  {g}  where  L  is  a  function  context,  P  :  Assn  is  a  precondi¬ 
tion,  g  :  Assn  X  Assn  x  { Val  — >  Assn)  is  a  postcondition,  and 
5  is  a  statement. 

Intuitively,  an  assertion  can  be  seen  as  a  potential  function  that 
maps  a  program  state  to  a  non-negative  potential.  The  potential 
of  the  precondition  P  must  be  sufficient  to  cover  the  cost  of  the 
execution  of  the  statement  5  and  the  potential  Q  after  the  execution 
of  5  (as  in  amortized  resource  analysis  [19]). 

Rules  Figure  4  shows  selected  rules  of  the  quantitative  logic.  We 
lift  the  operations  -I-  and  pointwise  to  assertions  P,  Q  :  Assn.  A 
constant  cGNu{oo}is  sometimes  used  as  the  constant  assertion 
_  H- >  c.  We  fix  an  event  metric  M  and  a  global  environment  (E,  A). 

In  the  Q:Skip  rule,  we  do  not  have  to  account  for  any  stack 
consumption.  As  a  result,  the  precondition  can  be  any  (potential) 
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(Q:Call) 


r(/)  =  {\{v,H).0,\{v,H).0) 

r  h  {(m/}f()  {(mj-,_L,_L)} _ 

r  h  {{mf  +  Xf}  f()  {{nif  +  Xj.,  1, 1)} 

r  h  {max{mf,mg)}f()  {Q} 


(Q:  Frame) 


(EQ) 


r{g)  =  {X{v,H).0,\{v,H).0)  , 

(0:CalL) 

r  h  {{nig}  gQ  {(mg,  I,!)} _ 

rh{{mg+Xg}gQ{{mg+Xg,l,l)} 


(Q:  Frame) 


r  h  {max{nif,mg)}  g()  {Q} 


(EQ) 


where 


A4'(call(/))  =  mf 


r  h  {max(m/,mg)}f();g()  {Q} 
M(call(g))  =  mg  Q  =  (max(my,  mg),  _L,  _L) 


(Q:Seq) 


Xq  =  max(mf,mg)  —  mg  for  d  e  {/,  g} 


Figure  5.  An  example  derivation  of  a  stack-space  bound  in  the  quantitative  logic. 


function.  After  the  execution,  the  skip  part  of  the  postcondition  must 
be  valid  on  the  same  (unchanged)  program  state.  So  we  have  to 
make  sure  that  we  do  not  end  up  with  more  potential  and  simply  use 
the  precondition  as  the  skip  part  of  the  postcondition.  The  break  and 
return  parts  of  the  postcondition  are  not  reachable  and  can  therefore 
be  arbitrary.  The  rules  QiBreak  and  Q:Return  are  similar. 

In  the  Q:Seq  rule  we  have  to  account  for  early  exits  in  statements. 
For  instance,  if  Si  contains  a  break  statement  then  S2  will  never  be 
executed  so  we  must  ensure  in  the  break  part  of  S'l’s  postcondition 
that  the  break  part  of  Si',  S2  holds.  For  the  same  reason,  the  return 
part  of  Si’s  postcondition  is  special. 

The  Q:LooP  rule  uses  the  same  principles  as  the  Q:Seq  rule 
to  tweak  the  final  postcondition.  In  the  case  of  Q:LoOP,  we  simply 
ensure  that  the  break  part  of  the  inner  statement  becomes  the  skip 
part  of  the  overall  statement.  We  use  ±  as  the  break  part  of  the 
loop  S  statement  since  its  operational  semantics  prevent  it  from 
terminating  differently  than  with  a  skip  or  a  return. 

The  Q:Call  rule  accounts  for  the  actual  stack-space  usage  of 
programs.  It  enforces  that  enough  stack  space  is  available  to  call  the 
function  /  by  adding  M{f)  to  the  pre-  and  postcondition.  The  pre- 
and  postconditions  are  taken  from  the  function  context  T. 

There  are  two  weakening  rules  in  the  quantitative  Hoare  logic. 
The  framing  rule  Q:  FRAME  weakens  a  statement  by  stating  that  if 
S  needs  P  bytes  to  run  and  leaves  Q  bytes  free  at  its  end,  then  it  can 
very  well  run  with  P  +  c  bytes  and  return  Q  +  c  bytes.  It  is  very 
handy  to  prove  tight  bounds  using  the  max  function  as  demonstrated 
in  Figure  5.  The  consequence  Q:Conseq  rule  is  directly  imported 
from  classical  Hoare  logics  except  that  instead  of  using  the  logical 
implication  =>  we  use  the  quantitative 

Auxiliary  State  The  main  difference  between  the  implemented 
logic  and  the  logic  described  here  is  that  the  latter  does  not  have 
an  auxiliary  stale.  Auxiliary  state  is  a  classic  extension  of  Hoare 
logic  (see  for  example  [30]).  The  auxiliary  state  is  used  to  share 
information  between  the  pre-  and  postcondition  of  a  triple.  In  a  logic 
without  auxiliary  state  (or  similar  techniques)  it  is  not  possible  to 
relate  program  states  before  and  after  a  statement.  For  example,  you 
cannot  specify  that  the  function  int  twice  0  {  i  =  i+i;} 
doubles  the  value  of  the  variable  i.  With  an  auxiliary  variable 
Z  it  is  possible  specify  this  fact  in  Hoare  logic  using  the  triple 
{i  =  Z}  t\A/ice()  {i  =  2  ■  Z}. 

One  technical  challenge  with  this  auxiliary  state  is  that  some 
triples,  for  example  {i  =  Z}  c{i  =  Z}  and  {i  =  Z  —  1}  c{i  = 
Z  —  1}  need  to  be  proved  equivalent  by  the  logic  to  handle  recursive 
calls.  This  problem  is  usually  solved  by  introducing  a  more  complex 
consequence  rule,  which  our  implemented  system  features.  The 
typical  use  case  is  when  we  apply  the  rule  Q:Call  to  a  recursive 
call.  In  this  case,  the  Hoare  triple  for  the  function  call  is  proved  by 
an  assumption  from  the  derivation  context  with  a  slightly  different 
auxiliary  state.  In  the  example  derivation  in  Figure  6  this  different 
state  is  Z  —  1.  Adapting  the  derivation  hypothesis  to  prove  the 
recursive  call  is  enabled  in  our  logic  by  an  extended  consequence 
rule  that  we  proved  sound  in  the  quantitative  setting. 

Stack  Framing  Another  minor  difference  is  in  the  function  appli¬ 
cation  rule  where  we  only  present  the  rule  for  function  calls  with  a 


{Z  =  log2(h,,-U)^Mi,-Z} 
bsearch(x,l,h)  { 

if  (h-1  <=  1)  return  1; 

{(Z>0  A  Z  =  log2(ho-— lo-))  Mg  ■  Z} 
m  =  (h+l)/2; 

{(Z>0  A  Z  =  log2(h<,-U)  A  m,,  =  ^  Mg  ■  Z} 

if  (a[m]>x)  h=in  else  l=m; 

{[Z-l  =  log2(h,-U)  ^  Mg  ■  (Z-1)]  -F  Mg} 
return  bsearch(x,l,h) ; 

{[Mg-(Z-1)]+Mg} 

} 

{Mg  ■  Z} 


Figure  6.  Derivation  with  auxiliary  state  for  the  bsearch  function. 


single  argument  and  without /raraing  of  stack  assertions.  The  latter 
is  necessary  to  carry  over  information  on  the  local  environment  from 
the  precondition  to  the  postcondition  of  the  function  call. 

Soundness  The  soundness  of  our  quantitative  logic  can  be  simply 
expressed  by  the  following  theorem. 

Theorem  2.  For  a  fixed  global  environment  (S,  A),  a  derivation 
in  our  quantitative  logic  for  a  statement  S  implies  a  bound  on  the 
weight  of  S,  that  is, 

■P{P}S{Q}  Va,M.P(<7,M)  ^  IF(,,Ar)(S',Kstop). 

Naturally,  we  have  to  prove  a  stronger  statement  that  takes  post¬ 
conditions  and  continuations  into  account  to  justify  the  soundness 
of  the  rules  of  the  logic.  This  is  not  unlike  as  in  program  logics  for 
low-level  code  [22]  and  Hoare-style  logics  for  CompCert  Clight  [3]. 
Furthermore,  we  have  to  assume  that  we  have  a  non-empty  function 
context  F;  and  finally,  we  have  to  step-index  the  correctness  state¬ 
ment  in  order  to  prove  its  soundness  by  induction.  The  details  can 
be  found  in  the  TR  [9]  and  in  the  Coq  development.  Of  course  we 
prove  in  Coq  that  the  intuitive  validity,  as  formulated  in  Theorem  2 
is  a  consequence  of  our  stronger  formulation  of  validity. 

Example  Figure  5  contains  an  example  derivation  for  the  state¬ 
ment  f  0 ;  g()  in  our  logic.  We  assume  that  we  have  already  verified 
that  the  function  bodies  of  f  and  g  do  not  allocate  stack  space,  that 
is,  r(5)  =  r(/)  =  {\{v,  H) .  0,  \{v,  H) .  0). 

Our  goal  is  to  derive  a  quantitative  Hoare  triple  that  expresses 
that  max(m/,  rUg),  the  maximum  of  the  stack  frame  sizes  of  f 
and  g,  is  a  bound  on  the  stack  usage;  and  that  after  the  execution 
max(m/,  rrig)  stack  space  is  available.  Since  the  effect  of  break 
and  return  statements  cannot  leak  outside  of  a  function  body,  the 
corresponding  postconditions  can  be  arbitrary  and  we  simply  use  ±. 

To  derive  our  goal,  we  first  have  to  apply  the  rule  Q:Seq  for 
sequential  composition.  In  the  derivation  of  the  function  call  f(), 
we  first  reorder  the  precondition  to  get  it  in  a  form  in  which  we  can 
apply  the  rule  Q:Frame  to  eliminate  the  max  operator.  We  then 
have  a  triple  that  is  amenable  to  an  application  of  the  rule  Q:Call 
that  uses  the  specification  of  the  body  of  f  in  T. 
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4.4  Limitations 

In  our  program  logic  described  in  this  section,  we  do  not  consider 
function  pointers,  goto  statements,  continue  statements,  and  switch 
statements,  even  though  our  Quantitative  CompCert  compiler  still 
supports  all  of  these.  It  would  be  possible  to  add  these  features  to 
our  logic  hy  building  on  the  ideas  of  advanced  program  logics  like 
XCAP  [29], 

5.  Automatic  Stack  Analyzer 

In  larger  C  programs  a  manual,  interactive  verification  with  a 
program  logic  is  too  tedious  and  time-consuming  to  be  practical. 
Therefore  we  have  developed  an  automatic  stack  analysis  tool  that 
operates  at  the  Clight  level  to  enable  the  analysis  of  real  system 
code.  We  view  this  automatic  tool  mainly  as  a  proof  of  concept  that 
demonstrates  the  value  of  the  logic  for  formal  verification  of  static 
analysis  tools.  In  the  future,  we  will  extend  our  automatic  analyzer 
with  advanced  techniques  like  amortized  resource  analysis  [5,  21]. 
This  is  however  beyond  the  scope  of  this  article. 

The  basic  idea  of  our  automatic  stack  analyzer  is  to  compute 
a  call  graph  from  the  Clight  code  and  to  derive  a  stack  bound 
for  each  function  in  topological  order.  In  Coq,  the  derivation  of  a 
function  hound  is  implemented  by  a  recursive  function  auto_bound 
on  the  abstract  syntax  tree  (AST)  of  a  Clight  program.  The  function 
auto_bound  does  not  only  compute  a  stack  bound  but  also  a 
derivation  in  our  quantitative  program  logic.  This  verifies  the 
correctness  of  the  generated  bound  and  enables  the  composition 
of  stack  bounds  that  have  been  derived  interactively  or  with  other 
static  analysis  tools.  In  addition  to  the  AST,  auto_bound  takes  a 
context  of  known  function  bounds  together  with  their  derivations  in 
the  logic  as  an  argument. 

Given  our  verified  quantitative  logic,  the  implementation  of 
auto_bound  is  straightforward.  For  trivial  commands  like  as¬ 
signments  or  skip,  auto_bound  simply  generates  the  bound  0 
and  a  derivation  like  {0}  skip  {(0,  0, 0)}.  For  a  sequential  com¬ 
position  S2  we  inductively  apply  auto_bound  to  and  S2, 
and  derive  the  bounds  {Bi}  Si  {{B" ,  ,  Bl)}  for  1=1,2.  We 

then  return  the  precondition  max{i3i,  B2}  and  the  postcondition 
(max{i3f ,  B2},  max{Bi,  B2},  max{Bf ,  SI})  for  the  command 
Si;  82-  The  derivation  of  this  bound  is  similar  to  the  example  deriva¬ 
tion  that  is  sketched  in  Figure  5.  The  computation  of  the  bound 
for  the  conditional  works  similar.  For  loops  we  can  use  the  bound 
derived  for  the  loop  body  to  obtain  a  bound  for  the  loop.  In  the 
derivation  we  just  apply  the  rule  Q:LooP.  Function  calls  are  han¬ 
dled  with  the  context  of  known  function  bounds  (recursion  is  not 
allowed  here)  and  the  rule  Q:Call. 

We  envision,  that  the  quantitative  logic  can  be  a  useful  backend  to 
verify  more  sophisticated  static  analyses.  For  our  simple,  automatic 
stack  analyzer  the  logic  was  already  very  convenient  and  enabled  us 
to  verify  the  analyzer  almost  without  additional  effort. 

We  have  combined  our  automatic  stack  analyzer  with  our  Quan¬ 
titative  CompCert  compiler.  The  result  is  a  verified  C  compiler  that 
translates  a  program  without  function  pointers  and  recursive  calls 
to  x86  assembly  and  automatically  derives  a  stack  bound  for  each 
function  in  the  program  including  main().  The  soundness  theorem 
we  have  proved  states  the  following.  If  a  given  program  is  memory- 
safe  and  the  verified  compiler  successfully  produces  an  assembly 
program  A  then  A  refines  the  source  program  and  runs  safely  on 
an  x86  machine  with  the  stack  size  that  has  been  computed  by  the 
automatic  stack  analysis  for  main()  (see  Point  3  of  Theorem  1). 

6.  Experimental  Evaluation 

To  validate  the  practicality  of  our  framework  for  stack-bound 
verification,  we  have  performed  an  experimental  evaluation  with 
more  than  3000  lines  of  C  code  from  different  sources  including 


File  Name  / 

Function  Name 

Verified 

Line  Count 

Stack  Bound 

mibench/net/dijkstra.c 

enqueue 

40  bytes 

(174  LOG) 

dequeue 

40  bytes 

dijkstra 

88  bytes 

mibench/auto/bitcount.c 

bitcount 

16  bytes 

(110  LOG) 

bitstring 

32  bytes 

mibench/sec/blowfish.c 

BF.encrypt 

40  bytes 

(233  LOG) 

BF.options 

8  bytes 

BF.ecb.encrypt 

80  bytes 

mibench/sec/pgp/md5.c 

MDSInit 

16  bytes 

(335  LOG) 

MDSUpdate 

168  bytes 

MDSFinal 

168  bytes 

MDSTransform 

128  bytes 

mibench/tele/fft.c 

IsPowerOfTwo 

16  bytes 

(195  LOG) 

NumberOfBitsNeeded 

24  bytes 

ReverseBits 

24  bytes 

fft.float 

160  bytes 

certikos/vmm.c 

palloc 

48  bytes 

(608  LOG) 

pfree 

40  bytes 

mem.init 

72  bytes 

pmapjnit 

176  bytes 

pt.free 

80  bytes 

ptJnit 

152  bytes 

ptJnit.kern 

136  bytes 

ptJnsert 

80  bytes 

pt_read 

56  bytes 

pt_resv 

120  bytes 

certikos/proc.c 

enqueue 

48  bytes 

(819  LOG) 

dequeue 

48  bytes 

kctxt.new 

72  bytes 

schedJnit 

232  bytes 

tdqueueJnit 

208  bytes 

threadJnit 

192  bytes 

thread-spawn 

96  bytes 

compcert/mandelbrot.c 

main 

56  bytes 

(92  LOG) 

com  pcert/n  body. c 

advance 

80  bytes 

(174  LOG) 

energy 

56  bytes 

offset_momentum 

24  bytes 

setup_bodies 

16  bytes 

main 

112  bytes 

Table  1.  Automatically  verified  stack  bounds  for  C  functions. 


hand  written  code,  programs  from  the  CompCert  test  suite,  programs 
from  the  MiBench  [17]  embedded  software  benchmark  suite,  and 
modules  from  the  simplified  development  version  of  the  CertiKOS 
operating  system  kernel  which  is  currently  being  verified. 

Tables  1  and  2  show  a  representative  compilation  of  the  exper¬ 
iments.  Table  1  contains  bounds  that  were  automatically  derived 
with  the  stack  analyzer.  Table  2  contains  8  bounds  that  were  interac¬ 
tively  derived  using  the  quantitative  logic  with  occasional  support 
of  the  automation.  The  size  of  the  analyzed  example  files  varies 
from  8  lines  of  code  (fib.c)  to  819  lines  of  code  (proc.c).  In  general, 
the  automatic  stack-bound  analysis  runs  very  efficiently  and  needs 
less  than  a  second  for  every  example  file  on  (one  core  of)  a  Linux 
workstation  with  32G  of  RAM  and  a  x86  processor  with  16  cores  at 
3.10Ghz. 

In  Table  1,  the  first  column  shows  the  file  name  of  the  examples 
together  with  the  number  of  lines,  the  second  column  contains  the 
name  of  selected  functions  from  that  file,  and  the  third  column  con¬ 
tains  the  verified  bound.  The  interactively-derived  bounds  in  Table  2 
are  presented  as  symbolic  expressions  parametric  in  the  functions’ 
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Function  Name 


Verified  Stack  Bound 


600 


recid() 

bsearch(x,  lo,  hi) 
fib(n) 

qsort(a,  lo,  hi) 
filter_pos(a,  sz,  lo,  hi) 
sum(a,  lo,  hi) 
fact_sq(n) 

filter_find(a,  sz,  lo,  hi) 


8a  bytes 

40(1  +  log2(hi  —  lo))  bytes 
24n  bytes 
48(hi  —  lo)  bytes 
48(hi  —  lo)  bytes 
32(hi  —  lo)  bytes 
40  +  24n^  bytes 
128  +  48(hi  —  lo)  +  401og2(BL)  bytes 


Table  2.  Manually  verified  stack  bounds  for  C  functions. 


arguments.  These  symbolic  expressions  are  slight  simplifications  of 
the  real  pre-  and  postconditions  of  the  functions  that  we  proved  in 
Coq.  The  actual  Hoare  triples  proved  in  Coq  carry  a  logical  meaning 
which  does,  for  instance,  require  that  the  qsort  function  be  called 
on  a  valid  sub  array.  The  file  sizes  of  the  manual  verified  examples 
range  from  8  to  52  lines  of  code. 

Our  main  application  of  the  automatic  stack-analyzer  is  the 
CertiKOS  operating  system  kernel  [15].  Currently,  the  stack  in 
CertiKOS  is  preallocated  and  proving  the  absence  of  stack-overflow 
is  essential  in  the  verification  of  the  reliability  of  the  system.  Since 
CertiKOS  does  not  use  recursion,  we  can  use  the  automatic  analysis 
to  derive  precise  stack  bounds.  Using  our  Quantitative  CompCert 
compiler,  we  were,  for  instance,  able  to  compile  and  compute  bounds 
for  the  virtual  memory  management  module  (certikos/vmm.c)  and 
the  process  management  module  (certikos/proc.c).  Because  of 
the  large  number  of  functions  in  CertiKOS,  only  a  sample  of  the 
analyzed  functions  is  displayed  in  Table  1. 

Testing  the  quantitative  Hoare  logic  and  the  compiler  on 
CompCert  test  suite  was  a  natural  choice  since  our  compiler  builds 
on  CompCert’s  architecture.  This  also  allowed  us  to  make  sure 
that  we  did  not  introduce  any  regression  with  respect  to  the  origi¬ 
nal  CompCert  compiler.  To  stress  the  expressivity  of  the  logic  we 
focused  on  test  programs  with  recursive  functions.  The  functions 
fib  and  qsort  in  Table  2  are  for  instance  from  the  CompCert  test 
suite.  Files  with  automatically  derived  bounds  for  non-recursive 
functions  from  the  CompCert  test  suite  include  mandelbrot.c  which 
computes  an  approximation  of  the  Mandelbrot  set  and  nbody.c 
which  computes  an  n-body  simulation  of  a  part  of  our  solar  system. 

We  also  made  sure  that  our  technique  can  handle  safety  critical 
embedded  software.  The  MiBench  [17]  benchmark  that  we  used 
for  this  purpose  is  free,  publicly  available,  and  representative  for 
embedded  software.  The  use  of  recursion  in  MiBench  programs  is 
relatively  rare,  which  makes  them  a  great  target  for  our  automatic 
stack  analyzer.  The  analyzed  examples  we  present  in  Table  1 
include  for  instance  Dijkstra’s  single-source  shortest-path  algorithm 
(dijkstra.c),  and  the  cryptographic  algorithms  Blowfish  (blowfish.c) 
and  MD5  (mdb.c). 

Finally,  Table  2  contains  some  recursive  functions  that  demon¬ 
strate  the  expressivity  of  our  quantitative  logic.  The  function  bsearch 
is,  for  example,  a  recursive  binary  search  with  logarithmic  recursion 
depth.  The  function  fib  computes  the  Fibonacci  sequence  using  an 
exponential  algorithm  and  the  function  qsort  implements  a  recursive 
version  of  the  quicksort  algorithm.  The  verification  of  the  function 
fact_sq  shows  the  modularity  of  the  logic:  We  first  verify  a  linear 
bound  for  the  factorial  function  and  then  use  this  bound  to  verify 
fact_sq(n),  which  contains  the  call  fact(n^).  The  function  filter_pos 
takes  an  array  and  computes  a  new  array  that  contains  all  positive 
elements  of  the  input  array.  Similarly,  filterjind  uses  the  binary 
search  bsearch  to  filter  out  all  elements  of  an  input  array  that  are 
contained  in  another  array  of  size  BL.  The  modularity  of  the  logic 
enables  us  to  reuse  the  logarithmic  bound  that  we  already  derived 
for  bsearch  in  the  proof.  The  verification  of  some  functions  is  still 
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Figure  7.  Experimental  evaluation  of  the  accuracy  of  hand-derived 
stack  bounds.  The  plots  compare  the  derived  bounds  (blue  lines) 
for  the  functions  bsearch  (at  the  top)  and  fact_sq  (at  the  bottom) 
with  the  measured  stack  usage  of  the  execution  of  the  respective 
function  for  different  inputs  (red  crosses).  The  x-axis  shows  either 
the  value  of  an  integer  argument  (fact_sq)  or  the  length  of  an  input 
array  (bsearch).  The  y-axis  shows  the  stack  usage  in  bytes. 


underway.  The  bounds  for  the  functions  redd,  bsearch,  fib,  and 
qsort  are  already  completely  verified. 

Our  experiments  show  that  the  automatic  stack  analyzer  works 
effectively  for  our  main  application,  the  CertiKOS  OS  kernel.  The 
reason  is  that  we  designed  the  quantitative  logic  to  include  exactly 
the  subset  of  Clight  that  is  needed  for  CertiKOS.  It  turned  out  that 
this  subset  is  also  sufficient  for  many  examples  in  the  CompCert 
test  suite  and  the  MiBench  embedded  software  benchmarks.  If  a 
program  is  not  interactively  analyzable  in  our  logic  then  this  is  due 
to  unsupported  language  constructs  such  as  switch  statements  and 
function  pointers.  Many  of  these  language  features  could  easily  be 
supported  by  relatively  small  additions  to  the  logic.  An  exception  to 
this  are  function  pointers  which  would  require  more  work,  following 
for  example  XCAP  [29]. 

We  have  evaluated  the  accuracy  of  the  verified  bounds  by 
comparing  them  with  the  actual  stack-space  consumption  of  the 
compiled  C  programs.  Our  experiments  show  that  our  framework 
is  expressive  enough  to  derive  very  tight  bounds  for  recursive  and 
non-recursive  programs.  All  manually  and  automatically  derived 
bounds  over-approximate  the  actual  stack-space  consumption  by 
exactly  4  bytes.  Figure  7  shows  the  results  of  two  representative 
experiments  with  hand-derived  bounds  for  recursive  programs.  The 
bound  derived  in  the  logic  is  plotted  together  with  the  actual  stack 
consumption  of  C  programs  measured  on  different  inputs. 

Measuring  the  stack  consumption  of  C  programs  on  modern 
computers  is  not  as  trivial  as  we  originally  thought.  The  measure¬ 
ment  is  complicated  by  some  security  mechanisms  and  unrestricted 
manipulation  of  the  stack  pointer  by  the  compiler.  To  this  end,  we 
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designed  a  small  C  program  that  uses  the  linux  system  call  ptrace. 
Using  the  this  system  call  our  tool  forks  the  monitored  process  as  a 
child  then  executes  it  step  by  step  while  keeping  track  of  its  stack 
consumption. 

The  reason  for  the  4-byte  looseness  of  the  bounds  is  that  stack 
frames  always  reserve  four  bytes  for  a  potential  function  call:  The 
return  address  needs  to  be  pushed  by  a  call  instruction  in  the  callee. 
Obviously,  the  last  function  in  the  function  call  chain  does  not  call 
any  other  function.  So  these  four  bytes  remain  unused. 

7.  Related  Work 

We  now  discuss  research  that  is  related  to  our  contributions  in  veri¬ 
fied  compilation,  program  logics,  and  automatic  resource  analysis. 

Verified  Compilation  Soundness  proofs  of  compilers  have  been 
extensively  studied  and  we  focus  on  formally  verified  proofs  here. 
Klein  and  Nipkow  [24]  developed  a  verified  compiler  from  an 
object-oriented,  Java-like  language  to  JVM  byte  code.  Chlipala  [11] 
describes  a  verified  compiler  from  the  simply-typed  lambda  calculus 
to  an  idealized  assembly  language.  In  contrast  to  our  work,  the 
aforementioned  works  do  not  model  nor  preserve  quantitative 
properties  such  as  stack  usage. 

Our  verified  Quantitative  CompCert  compiler  is  an  extension  of 
the  CompCert  C  Compiler  [26,  27].  Despite  being  formally  verified, 
important  quantitative  properties  such  as  memory  and  time  usage 
of  programs  compiled  with  CompCert  have  still  to  be  verified  at 
the  assembly  level  [6].  Admittedly,  there  exists  a  clever  annotation 
mechanism  [6]  in  CompCert  that  allows  to  transport  assertions 
on  program  states  from  the  source  level  to  the  target  machine 
code.  However,  these  assertions  can  only  contain  statements  about 
memory  states  but  not  bounds  on  the  number  of  loop  iterations  and 
or  recursion  depth  of  functions.  The  novelty  of  our  Quantitative 
CompCert  extension  to  CompCert  is  that  it  enables  us  to  reason 
about  quantitative  properties  of  event  traces  during  compilation. 
Another  novelty  is  that  we  model  the  assembly  level  semantics 
more  realistically  by  using  a  finite  stack.  In  particular,  we  do 
not  have  to  use  pseudo  instructions  anymore.  This  is  similar  to 
CompCertTSO  [32].  However,  we  use  event  traces  to  get  guarantees 
on  the  size  of  the  stack  that  is  needed  to  ensure  refinement.  On  the 
other  hand,  it  is  always  possible  that  the  compiled  code  runs  out  of 
stack  space  in  CompCertTSO. 

In  the  context  of  the  Hume  language  [18],  lost  et  al.  [23] 
developed  a  quantitative  semantics  for  a  functional  language  and 
related  it  to  memory  and  time  consumption  of  the  compiled  code 
for  the  Renesas  M32C/85U  embedded  micro-controller  architecture. 
In  contrast  to  our  work,  the  relation  of  the  compiled  code  with 
functional  code  is  not  formally  proved. 

Program  Logics  In  the  development  of  our  quantitative  Hoare 
logic  we  have  drawn  inspiration  from  mechanically  verified  Hoare 
logics.  Nipkow’s  [30]  description  of  his  implementations  of  Hoare 
logics  in  Isabelle/HOL  has  been  helpful  to  understand  the  interaction 
of  auxiliary  variables  with  the  consequence  rule.  The  consequence 
rule  we  use  in  our  Coq  implementation  is  a  quantitative  version  of 
a  consequence  rule  that  has  been  attributed  to  Martin  Hofmann  by 
Nipkow  [30].  Appel’s  separation  logic  for  CompCert  Clight  [3]  has 
been  a  blueprint  for  the  structure  of  the  quantitative  logic.  Since  we 
do  not  deal  with  memory  safety,  our  logic  is  much  simpler  and  it 
would  be  possible  to  integrate  it  with  Appel’s  logic.  The  continuation 
passing  style  that  we  use  in  the  quantitative  logic  is  not  only  used 
by  Appel  [3]  but  also  in  Hoare  logics  for  low-level  code  [22,  29]. 

There  exist  quantitative  logics  that  are  integrated  into  separation 
logic  [5,  20]  and  they  are  closely  related  to  our  quantitative  logic. 
However,  the  purpose  of  these  logics  is  slightly  different  since  they 
focus  on  the  verification  of  bounds  that  depend  on  the  shape  of 
heap  data  structures.  Moreover,  they  are  only  defined  for  idealized 


languages  and  do  not  provide  any  guarantees  for  compiled  code. 
Also  closely  related  to  our  logic  is  a  VDM-style  logic  for  reasoning 
about  resource  usage  of  JVM  byte  code  by  Aspinall  et  al.  [4].  Their 
logic  is  more  general  and  applies  to  different  quantitative  resources 
while  we  focus  on  stack  usage.  However,  it  is  unclear  how  realistic 
the  presented  resource  metrics  are.  On  the  other  hand,  our  logic 
applies  to  system  code  written  in  C,  is  verified  with  respect  to 
CompCert  Clight,  and  derives  bounds  for  x86  assembly. 

Resource  Analysis  There  exists  a  large  body  of  research  on 
statically  deriving  stack  bounds  on  low-level  code  [8,  10,  31]  as 
well  as  commercial  tools  such  as  the  Bound-T  Time  and  Stack 
Analyser  and  Absint’s  StackAnalyzer  [14].  We  are  however  not 
aware  of  any  formally  verified  techniques.  For  high-level  languages 
there  exists  a  large  number  of  systems  for  statically  inferring  or 
checking  quantitative  requirements  such  as  stack  usage  [1,  12,  19, 
23].  However,  they  are  not  formally  verified  and  do  not  apply  to 
system  code  that  is  written  in  C.  For  C  programs,  there  exist  methods 
to  automatically  derive  loop  bounds  [16,  36]  but  the  proposed 
methods  are  not  verified  and  it  is  unclear  if  they  can  be  used  for 
computing  stack  bounds. 

We  are  only  aware  of  two  verified  quantitative  analysis  systems. 
Albert  et  al.  [2]  rely  on  the  KeY  tool  to  automatically  verify 
previously  inferred  loop  invariants,  size  relations,  and  ranking 
functions  for  Java  Card  programs.  However,  they  do  not  have  a 
formal  cost  semantics  and  do  not  verify  actual  stack  bounds.  Blazy 
et  al.  [7]  have  verified  a  loop  bound  analysis  for  CompCert’s  RTL 
intermediate  language.  It  is  however  unclear  how  the  presented 
technique  can  he  used  to  verify  stack  hounds  or  to  formally  translate 
bounds  to  a  lower-level  during  compilation. 

8.  Conclusion 

Embedded  software  has  always  been  a  target  of  verified  compilers. 
As  a  result,  aiding  verification  of  quantitative  properties  remains  a 
major  goal  for  verified  compilation.  In  one  of  the  earliest  articles 
[26]  on  CompCert,  Leroy  stated: 

“[...]  it  is  hopeless  to  prove  a  stack  memory  bound  on  the 

source  program  and  expect  this  resource  certification  to  carry 

out  to  compiled  code:  stack  consumption,  like  execution  time, 

is  a  program  property  that  is  not  preserved  by  compilation.” 

Ironically,  Leroy’s  groundbreaking  work  on  CompCert  has  been  the 
main  inspiration  in  our  development  of  a  framework  that  enables 
exactly  such  a  resource  certification  of  stack-consumption  bounds 
for  compiled  x86  assembly  code  at  the  C  level. 

We  have  developed  Quantitative  CompCert,  a  realistic,  verified 
C  compiler  which  shows  how  verified  compilation  enables  the 
verification  of  quantitative  properties  of  compiled  programs  at  the 
source  level.  We  have  implemented  and  formally  verified  a  novel 
quantitative  Hoare  logic  for  CompCert  Clight  which  is  an  ideal 
backend  for  static  analysis  tools.  This  is  demonstrated  through  the 
implementation  of  a  verified,  automatic  stack-analysis  tool  that 
computes  derivations  in  the  quantitative  logic.  Finally,  we  have 
shown  through  experiments  that  our  framework  can  be  applied  to 
derive  precise  stack  bounds  for  typical  system  code. 

Our  work  opens  the  door  for  the  verification  of  powerful  static 
analysis  tools  for  quantitative  properties  that  operate  on  the  C 
level  rather  than  on  the  machine  code.  There  are  multiple  future 
research  directions  that  we  plan  to  explore  on  the  basis  of  the  present 
development.  For  one  thing,  we  want  to  use  our  quantitative  Hoare 
logic  to  verify  more  powerful  analysis  tools  that  can  automatically 
derive  stack-space  bounds  for  recursive  functions.  For  another 
thing,  we  plan  to  generalize  the  developed  concepts  to  apply  our 
technique  to  other  resources  such  as  heap-memory  and  clock-cycle 
consumption. 
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Abstract 

Many  verification  problems  can  be  reduced  to  refinement  verifica¬ 
tion.  However,  existing  work  on  verifying  refinement  of  concurrent 
programs  either  fails  to  prove  the  preservation  of  termination,  al¬ 
lowing  a  diverging  program  to  trivially  refine  any  programs,  or  is 
difficult  to  apply  in  compositional  thread-local  reasoning.  In  this 
paper,  we  first  propose  a  new  simulation  technique,  which  estab¬ 
lishes  termination-preserving  refinement  and  is  a  congruence  with 
respect  to  parallel  composition.  We  then  give  a  proof  theory  for  the 
simulation,  which  is  the  first  Hoare-style  concurrent  program  logic 
supporting  termination-preserving  refinement  proofs.  We  show  two 
key  applications  of  our  logic,  i.e.,  verifying  linearizability  and  lock- 
freedom  together  for  fine-grained  concurrent  objects,  and  verifying 
full  correctness  of  optimizations  of  concurrent  algorithms. 

Categories  and  Subject  Descriptors  D.2.4  [Software  Engineer¬ 
ing]:  Software/Program  Verification  -  Correctness  proofs.  Formal 
methods;  F.3.1  [Logics  and  Meanings  of  Programs]:  Specifying 
and  Verifying  and  Reasoning  about  Programs 

General  Terms  Theory,  Verification 

Keywords  Concurrency,  Refinement,  Termination  Preservation, 
Rely-Guarantee  Reasoning,  Simulation 

1.  Introduction 

Verifying  refinement  between  programs  is  the  crux  of  many  ver¬ 
ification  problems.  For  instance,  reasoning  about  compilation  or 
program  transformations  requires  proving  that  every  target  pro¬ 
gram  is  a  refinement  of  its  source  [9].  In  concurrent  settings,  re¬ 
cent  work  [4,  12]  shows  that  the  correctness  of  concurrent  data 
structures  and  libraries  can  be  characterized  via  some  forms  of  con¬ 
textual  refinements,  i.e.,  every  client  that  calls  the  concrete  library 
methods  should  refine  the  client  with  some  abstract  atomic  oper¬ 
ations.  Verification  of  concurrent  garbage  collectors  [11]  and  OS 
kernels  [18]  can  also  be  reduced  to  refinement  verification. 

Refinement  from  the  source  program  S  to  the  target  T,  written 
as  T  C  S',  requires  that  T  have  no  more  observable  behaviors 
than  S.  Usually  observable  behaviors  include  the  traces  of  external 
events  such  as  I/O  operations  and  runtime  errors.  The  question  is, 


Permission  to  make  digital  or  hard  copies  of  all  or  part  of  this  work  for  personal  or 
classroom  use  is  granted  without  fee  provided  that  copies  are  not  made  or  distributed 
for  profit  or  commercial  advantage  and  that  copies  bear  this  notice  and  the  full  citation 
on  the  first  page.  Copyrights  for  components  of  this  work  owned  by  others  than  ACM 
must  be  honored.  Abstracting  with  credit  is  permitted.  To  copy  otherwise,  or  republish, 
to  post  on  servers  or  to  redistribute  to  lists,  requires  prior  specific  permission  and/or  a 
fee.  Request  permissions  from  permissions@acm.org. 

CSL-LICS  2014,  July  14—18,  2014,  Vienna,  Austria. 

Copyright  ©  2014  ACM  978-1-4503-2886-9. . .  $15.00. 
http://dx.doi.org/10.1145/2603088.2603123 


should  termination  of  the  source  be  preserved  too  by  the  target?  If 
yes,  how  to  verify  such  refinement? 

Preservation  of  termination  is  an  indispensable  requirement 
in  many  refinement  applications.  For  instance,  compilation  and 
optimizations  are  not  allowed  to  transform  a  terminating  source 
program  to  a  diverging  (non-terminating)  target.  Also,  implemen¬ 
tations  of  concurrent  data  structures  are  often  expected  to  have 
progress  guarantees  (e.g.,  lock-freedom  and  wait-freedom)  in  ad¬ 
dition  to  linearizability.  The  requirements  are  equivalent  to  some 
contextual  refinements  that  preserve  the  termination  of  client  pro¬ 
grams  [12]. 

Most  existing  approaches  for  verifying  concurrent  program 
refinement,  including  simulations  (e.g.,  [11]),  logical  relations 
(e.g.,  [22]),  and  refinement  logics  (e.g.,  [21]),  do  not  reason 
about  the  preservation  of  termination.  As  a  result,  a  program  that 
does  an  infinite  loop  without  generating  any  external  events,  e.g. 
while  true  do  skip,  would  trivially  refine  any  source  program 
(just  like  that  it  trivially  satisfies  partial  correctness  in  Hoare  logic). 
Certainly  this  kind  of  refinement  is  not  acceptable  in  the  applica¬ 
tions  mentioned  above. 

CompCert  [9]  addresses  the  problem  by  introducing  a  well- 
founded  order  in  the  simulation,  but  it  works  only  for  sequential 
programs.  It  is  difficult  to  apply  this  idea  to  do  thread-local  ver¬ 
ification  of  concurrent  program  refinement,  which  enables  us  to 
know  Ti  II  r2  C  Si  II  S2  by  proving  Ti  C  Si  and  T2  U  S2. 
In  practice,  the  termination  preservation  in  the  refinement  proofs  of 
individual  threads  could  be  easily  broken  by  the  interference  from 
their  environments  (i.e.,  other  threads  running  in  parallel).  For  in¬ 
stance,  a  method  call  of  a  lock-free  data  structure  (e.g.,  Treiber 
stack)  may  never  terminate  when  other  threads  call  the  methods 
and  update  the  shared  memory  infinitely  often.  As  we  will  explain 
in  Sec.  2,  the  key  challenge  here  is  to  effectively  specify  the  en¬ 
vironments’  effects  on  the  termination  preservation  of  individual 
threads.  As  far  as  we  know,  no  previous  work  can  use  “composi¬ 
tional”  thread-local  reasoning  to  verify  termination-preserving  re¬ 
finement  between  (whole)  concurrent  programs. 

In  this  paper,  we  first  propose  novel  rely/guarantee  conditions 
which  can  effectively  specify  the  interference  over  the  termina¬ 
tion  properties  between  a  thread  and  its  environment.  Traditional 
rely/guarantee  conditions  [8]  are  binary  relations  of  program  states 
and  they  specify  the  state  updates.  We  extend  them  with  a  boolean 
tag  indicating  whether  a  state  update  may  let  the  thread  or  its  envi¬ 
ronment  make  more  moves. 

With  the  help  of  our  new  rely/guarantee  conditions,  we  then 
propose  a  new  simulation  RGSim-T,  and  a  new  Hoare-style  pro¬ 
gram  logic,  both  of  which  support  compositional  verification 
of  termination-preserving  refinement  of  concurrent  programs. 
Our  work  is  based  on  our  previous  compositional  simulation 
RGSim  [11]  (which  unfortunately  cannot  preserve  termination), 
and  is  inspired  by  Hoffmann  et  al.’s  program  logic  for  lock- 
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(a)  source  code  S'.  x++ ; 


1  local  t ; 

2  t  :=  x; 

3  X  :  =  t  +  1 ; 

(b)  target  code  Ti, 


1  local  t,  done  :=  false; 

2  while  ( !  done)  { 

3  t  :=  x; 

4  done  :=  cas(&x,  t,  t+1) ; 

5  } 

(c)  target  code  Tc 


Figure  1.  Counters. 


freedom  [7]  (which  does  not  support  refinement  verification  and 
has  limitations  on  local  reasoning,  as  we  will  explain  in  Sec.  7),  but 
makes  the  following  new  contributions: 

•  We  design  a  simulation,  RGSim-T,  to  verify  termination¬ 
preserving  refinement  of  concurrent  programs.  As  an  exten¬ 
sion  of  RGSim,  it  considers  the  interference  between  threads 
and  the  environments  by  taking  our  novel  rely/guarantee  condi¬ 
tions  as  parameters.  RGSim-T  is  compositional.  It  allows  us  to 
thread-locally  reason  about  the  preservation  of  whole-program 
termination,  but  without  enforcing  the  preservation  of  individ¬ 
ual  threads’  termination,  thus  can  be  applied  to  many  practical 
refinement  applications. 

•  We  propose  the  first  program  logic  that  supports  compositional 
verification  of  termination-preserving  refinement  of  concurrent 
programs.  In  addition  to  a  set  of  compositionality  (binary  rea¬ 
soning)  rules,  we  also  provide  a  set  of  unary  rules  (built  upon 
the  unary  program  logic  LRG  [3])  that  can  reason  about  con¬ 
ditional  correspondence  between  the  target  and  the  source,  a 
usual  situation  in  concurrent  refinement  (see  Sec.  2).  The  logic 
enables  compositional  verification  of  nested  loops  and  sup¬ 
ports  programs  with  infinite  nondeterminism.  The  soundness  of 
the  logic  ensures  RGSim-T  between  the  target  and  the  source, 
which  implies  the  termination-preserving  refinement. 

•  Our  simulation  and  logic  are  general.  They  can  be  applied 
to  verify  linearizability  and  lock-freedom  together  for  fine¬ 
grained  concurrent  objects,  or  to  verify  the  full  correctness  of 
optimizations  of  concurrent  programs,  i.e.,  the  optimized  pro¬ 
gram  preserves  behaviors  on  both  functionality  and  termination 
of  the  original  one.  We  demonstrate  the  effectiveness  of  our 
logic  by  verifying  linearizability  and  lock-freedom  of  Treiber 
stack  [20],  Michael-Scott  queue  [14]  and  DGLM  queue  [2],  the 
full  correctness  of  synchronous  queue  [16]  and  the  equivalence 
between  TTAS  lock  and  TAS  lock  implementations  [6]. 

It  is  important  to  note  that  our  simulation  and  logic  ensure  that 
the  target  preserves  the  termination/divergence  behaviors  of  the 
source.  The  target  could  diverge  if  the  source  diverges.  Therefore 
our  logic  is  not  for  verifying  total  correctness  (i.e.,  partial  correct¬ 
ness  -I-  termination).  It  is  actually  more  powerful  and  general.  We 
give  more  discussions  on  this  point  in  Secs.  2.2  and  5.2. 

In  the  rest  of  this  paper,  we  first  analyze  the  challenges  and 
explain  our  approach  informally  in  Sec.  2.  Then  we  formulate 
termination-preserving  refinement  in  Sec.  3.  We  present  our  new 
simulation  RGSim-T  in  Sec.  4  and  our  new  program  logic  in  Sec.  5. 
We  summarize  examples  that  we  verified  in  Sec.  6,  and  discuss  the 
related  work  and  conclude  in  Sec.  7. 

2.  Informal  Development 

Below  we  informally  explain  the  challenges  and  our  solutions  in 
our  design  of  the  simulation  and  the  logic  respectively. 


Si  - >  S2 

S'  I  '\s'  R  S'l  '\s' 

Ti  - T2  — >  Tg  - >  T4 

(with|T2|  <  |Ti|and|T4|  <  ITgl) 

(b)  H  h  T  A'  S 
Figure  2.  Simulation  diagrams. 

finement  and  then  discuss  its  problems  in  termination-preserving 
concurrent  refinement  verification. 

Fig.  1(a)  shows  the  source  code  S  that  increments  x.  In  a 
sequential  setting,  it  can  be  implemented  as  Tb  in  Fig.  1(b).  To 
show  that  Tb  refines  S,  a  natural  way  is  to  prove  that  they  satisfy 
the  (weak)  simulation  S  in  Fig.  2(a). 

The  simulation  first  establishes  some  consistency  relation  be¬ 
tween  the  source  and  the  target  (note  S  and  T  here  are  whole  pro¬ 
gram  configurations  consisting  of  both  code  and  states).  Then  it 
requires  that  there  is  some  correspondence  between  the  execution 
of  the  target  and  the  source  so  that  the  relation  is  always  preserved. 
Every  execution  step  of  the  target  must  either  correspond  to  one 
or  more  steps  of  the  source  (the  left  part  of  Fig.  2(a)),  or  corre¬ 
spond  to  zero  steps  (the  right  part;  Let’s  ignore  the  requirement  of 
\T'\  <  |r|  for  now).* 

For  our  example  in  Fig.  1,  the  simulation  requires  that  x  at  the 
target  level  have  the  same  value  with  x  in  the  source.  We  let  line  2 
at  Tb  correspond  to  zero  steps  of  S,  and  line  3  correspond  to  the 
single  step  of  S. 

Such  a  simulation,  however,  has  two  problems  for  termination¬ 
preserving  concurrent  refinement  verification.  First,  it  does  not 
require  the  target  to  preserve  the  termination  of  the  source.  Since 
a  silent  step  at  the  target  level  may  correspond  to  zero  steps  at  the 
source  (the  right  part  of  Fig.  2(a)),  the  target  may  execute  such 
steps  infinitely  many  times  and  never  correspond  to  a  step  at  the 
source.  For  instance,  if  we  insert  while  true  do  skip  before 
line  2  in  Tb,  the  simulation  still  holds,  but  Tb  diverges  now.  To 
address  this  problem,  CompCert  [9]  introduces  a  metric  |T|  over 
the  target  program  configurations,  which  is  equipped  with  a  well- 
founded  order  <.  If  a  target  step  corresponds  to  no  moves  of  the 
source,  the  metric  over  the  target  programs  should  strictly  decrease 
(i.e.,  the  condition  |T'|  <  |r|  in  Fig.  2(a)).  Since  the  well-founded 
order  ensures  that  there  are  no  infinite  decreasing  chains,  execution 
of  the  target  will  finally  correspond  to  at  least  one  step  at  the  source. 

Second,  it  is  not  compositional  w.r.t.  parallel  compositions. 
Though  Tb  ^  S  holds,  {Tb  ||  r6);print(x)  A  {S  ||  S');  print(a;) 
does  not  hold  since  the  left  side  may  print  out  1,  which  is  im¬ 
possible  for  the  source  on  the  right.  The  problem  is  that  when  we 
prove  Tb  :<  S,  Tb  and  S  are  viewed  as  closed  programs  and  the 
interference  from  environments  is  ignored.  To  get  the  parallel  com¬ 
positionality,  we  follow  the  ideas  in  our  previous  work  RGSim  [11] 
and  parameterize  the  simulation  with  the  interference  between  the 
programs  and  their  environments. 

As  shown  in  Fig.  2(b),  the  new  simulation  A'  is  parameterized 
with  the  environment  interference  R,  i.e.  the  set  of  all  possible 
transitions  of  the  environments  at  the  target  and  source  levels.  Here 
we  use  thin  arrows  for  the  transitions  of  the  current  thread  at  the 
source  and  the  target  levels  (e.g.,  from  Ti  to  T2  and  from  Tg  to  T4 
in  Fig.  2(b)),  and  thick  arrows  for  the  possible  environment  steps 
(e.g.,  from  T2  to  Tg  and  from  Si  to  S2  in  the  figure).  We  require 
the  simulation  A'  to  be  preserved  by  R. 


*  Note  here  we  only  discuss  silent  steps  (a.k.a.  r-steps)  which  produce  no 
external  events.  The  simulation  also  requires  that  every  step  with  an  external 
event  at  the  target  level  must  correspond  to  one  step  at  the  source  with  the 
same  event  plus  zero  or  multiple  r-steps. 


(with|T'|<|T|) 
(a)  T  A  S 


2.1  Simulation 

Simulation  is  a  standard  technique  for  refinement  verification.  We 
start  by  showing  a  simple  simulation  for  verifying  sequential  re- 
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Then,  to  prove  termination-preserving  concurrent  refinement,  it 
seems  natural  to  combine  the  two  ideas  and  have  a  simulation  pa¬ 
rameterized  with  environment  interference  and  a  metric  decreasing 
for  target  steps  that  correspond  to  no  steps  at  the  source.  Therefore 
we  require  |T2 1  <  |Ti  |  and  IT4 1  <  IT3 1  in  the  case  of  Fig.  2(b).  But 
how  would  the  environment  steps  change  the  metric  ? 

First  attempt.  Our  first  attempt  to  answer  this  question  is  to  allow 
environment  steps  to  arbitrarily  change  the  metrics  associated  with 
the  target  program  configurations.  Therefore  it  is  possible  to  have 
IT2I  <  |T3|inFig.2(b). 

The  resulting  simulation,  however,  is  still  not  compositional 
w.r.t.  parallel  compositions.  For  instance,  for  the  following  two 
threads  in  the  target  program: 

while(i==0)  i — ;  ||  whlle(i==0)  i++; 

we  can  prove  that  this  simulation  holds  between  each  of  them  and 
the  source  program  skip,  if  we  view  i  as  local  data  used  only 
at  the  target  level.  We  could  define  the  metric  as  1  if  i  =  0 
and  0  otherwise.  For  the  left  thread,  it  decreases  the  metric  if  it 
executes  the  loop  body.  The  increment  of  i  by  its  environment 
(the  right  thread)  may  change  i  back  to  0,  increasing  the  metric. 
This  is  allowed  in  our  simulation.  The  case  for  the  right  thread  is 
symmetric.  Flowever,  if  we  view  the  parallel  composition  of  the  two 
threads  as  a  whole  program,  it  may  not  terminate,  thus  cannot  be  a 
termination-preserving  refinement  of  skip  ||  skip. 

Second  attempt.  The  first  attempt  is  too  permissive  to  have  par¬ 
allel  compositionality,  because  we  allow  a  thread  to  make  more 
moves  whenever  its  environment  interferes  with  it.  Thus  our  sec¬ 
ond  attempt  enforces  the  metric  of  a  thread  to  decrease  or  stay  un¬ 
changed  under  environment  interference.  For  the  case  of  Fig.  2(b), 
we  require  IT3I  <  \T2  \  on  environment  steps. 

This  simulation  is  compositional,  but  it  is  too  strong  and  can¬ 
not  be  satisfied  by  many  useful  refinements.  For  instance,  Tc  in 
Fig.  1(c)  uses  a  compare-and-swap  (cas)  instruction  to  atomically 
update  X.  It  is  a  correct  lock-free  implementation  of  S  in  concur¬ 
rent  settings,  but  the  new  simulation  of  our  second  attempt  does 
not  hold  between  Tc  and  S.  If  an  environment  step  between  lines  3 
and  4  of  Tc  increments  x,  the  cas  at  line  4  will  return  false  and  Tc 
needs  to  execute  another  round  of  loop.  Therefore  such  an  environ¬ 
ment  step  increases  the  number  of  silent  steps  of  Tc  that  correspond 
to  no  moves  of  S.  However,  our  new  simulation  does  not  allow  an 
environment  step  to  increase  the  metric,  so  the  simulation  cannot 
be  established. 

Our  solution.  Our  solution  lies  in  the  middle  ground  of  the  two 
failed  attempts.  We  specify  explicitly  in  the  parameter  R  which 
environment  steps  may  make  the  current  thread  move  more  (i.e., 
allow  the  thread’s  metric  to  increase  in  the  simulation).  Here  we 
distinguish  in  R  the  steps  that  correspond  to  source  level  moves 
from  those  that  do  not.  We  allow  the  metric  to  be  increased  by  the 
former  (as  in  our  first  attempt),  but  not  by  the  latter  (which  must 
decrease  or  preserve  the  metric,  as  in  our  second  attempt). 

This  approach  is  based  on  the  observation  that  the  failure  of 
cas  in  Tc  of  Fig.  1(c)  must  be  caused  by  an  environment  step 
that  successfully  increments  x,  which  corresponds  to  a  step  at  the 
source  level.  Although  the  termination  of  the  current  thread  Tc  is 
delayed,  the  whole  system  consisting  of  both  the  current  thread  and 
the  environment  progresses  by  making  a  corresponding  step  at  the 
source  level.  Therefore,  the  delay  of  the  termination  of  the  current 
thread  should  be  acceptable,  and  we  should  allow  such  environment 
steps  to  increase  the  metric  of  the  current  thread. 

In  this  paper,  we  follow  the  idea  of  rely/guarantee  reasoning  [8] 
and  use  the  rely  condition  to  specify  environment  steps.  However, 
we  extend  the  traditional  rely  conditions  with  an  extra  boolean  tag 
indicating  whether  an  environment  step  corresponds  to  a  step  at  the 


source  level.  Our  new  simulation  RGSim-T  extends  RGSim  by  in¬ 
corporating  the  idea  of  metrics  to  achieve  termination  preservation. 
It  is  parameterized  with  the  new  rely  (and  guarantee)  conditions  so 
that  we  know  how  an  environment  step  could  affect  the  metric.  The 
formal  definition  of  RGSim-T  is  given  in  Sec.  4. 

Relationships  to  lock-freedom,  obstruction-freedom  and  wait- 
freedom.  If  the  source  program  is  just  a  single  atomic  opera¬ 
tion  (e.g.  X++),  our  new  simulation  RGSim-T  can  be  viewed  as  a 
proof  technique  for  lock-freedom  of  the  target,  which  ensures  that 
there  always  exists  some  thread  that  will  complete  an  operation  at 
the  source  level  in  a  finite  number  of  steps.  That  is,  the  failure  of 
a  thread  to  finish  its  operation  must  be  caused  by  the  successful 
completion  of  source  operations  by  its  environment. 

In  fact,  the  simulations  of  our  first  and  second  attempts  can 
be  viewed  as  proof  techniques  for  obstruction-freedom  and  wait- 
freedom  respectively  of  concurrent  objects.  Obstruction-freedom 
ensures  that  every  thread  will  complete  its  operation  whenever  it  is 
executed  in  isolation  (i.e.,  without  interference  from  other  threads). 
In  the  simulation  of  our  first  attempt,  though  a  thread  is  allowed  to 
not  make  progress  under  environment  interference,  it  has  to  com¬ 
plete  some  source  operations  when  its  environments  do  not  inter¬ 
fere.  Wait-freedom  ensures  the  completion  of  the  operation  of  any 
thread.  Correspondingly  in  the  simulation  of  our  second  attempt,  a 
thread  has  to  make  progress  no  matter  what  the  environment  does. 

2.2  Program  Logic 

The  compositionality  of  our  new  simulation  RGSim-T  allows  us 
to  decompose  the  refinement  for  large  programs  to  refinements 
for  small  program  units,  therefore  we  could  derive  a  set  of  syn¬ 
tactic  Hoare-style  rules  for  refinement  verification,  as  we  did  for 
RGSim  [11].  For  instance,  a  sequential  composition  rule  may  be  in 
the  following  form: 

Rh  {P}ri:<Si{P'}  {P'}T2<S2{Q} 

Ph  {P}ri;T2:<5'i;S2{Q} 

Here  we  use  R  h  {P}T  <  S'{Q}  to  represent  the  corresponding 
syntactic  judgment  of  RGSim-T.  R  denotes  the  environment  inter¬ 
ference.  P,  Q  and  P'  are  relational  assertions  that  relate  the  pro¬ 
gram  states  at  the  target  and  the  source  levels.  The  rule  says  if  we 
could  establish  refinements  (in  fact,  RGSim-Ts)  between  Ti  and 
Si,  and  between  T2  and  S2,  we  know  Ti;T2  refines  Si;  82-  We 
could  give  similar  rules  for  parallel  composition  and  other  compo¬ 
sitional  commands. 

However,  in  many  cases  the  correspondence  between  program 
units  at  the  target  and  the  source  levels  cannot  be  determined 
statically.  That  is,  just  by  looking  at  Ti;  T2  and  Si;  S2,  we  may 
not  know  statically  that  Ti  refines  Si  and  T2  refines  S2  and  then 
apply  the  above  sequential  composition  rule.  To  see  the  problem, 
we  unfold  the  while-loop  of  Tc  in  Fig.  1  and  get  the  following  T^: 

1  local  t,  done;  4  while  (!done)  { 

2  t  :=  x;  5  t  :=  x; 

3  done  :=  cas(&x,t,t+l) ;  6  done  :=  cas(&x,t,t+l) ; 

7  } 

Clearly  T'c  refines  S  too.  However,  whether  the  cas  instruction  at 
line  3  fulfils  the  operation  in  S  or  not  depends  on  whether  the  com¬ 
parison  succeeds  in  runtime.  Thus  we  cannot  apply  the  composi¬ 
tionality  rules  for  RGSim-T  to  decompose  the  refinement  about  T^. 
We  have  to  refer  to  the  semantics  of  the  simulation  definition  to 
prove  the  refinement,  which  would  be  rather  ineffective  for  large 
scale  programs.  Similar  issues  also  show  up  in  our  earlier  work  on 
RGSim  [11],  and  in  relational  Hoare  logic  [1]  and  relational  sepa¬ 
ration  logic  [25]  if  they  are  applied  to  concurrent  settings. 

To  address  this  problem,  we  extend  the  assertion  language  to 
specify  as  auxiliary  state  the  source  code  remaining  to  be  refined. 
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In  addition  to  the  binary  judgment  R  h  {P}T  <  5'{Q},  we  intro¬ 
duce  a  unary  judgment  in  the  form  of  J?  h  {P  A  arem(S')}r{Q  A 
arem(S'')}  for  refinements  that  cannot  be  decomposed.  Here 
aremjS)  means  S  is  the  remaining  source  to  be  refined  by  the 
target.  Then  R,G  {P  A  arem(S')}r{Q  A  arem(skip)}  says 
that  T  refines  S,  since  the  postcondition  shows  at  the  end  of  the 
target  T  there  are  no  remaining  operations  from  S  to  be  refined. 
We  provide  the  following  rule  to  derive  the  binary  judgment  from 
the  unary  one: 

R  \-  {P  A  arem(S)}r{(5  A  arem(sklp)} 

{P}T<S{Q} 

On  the  other  hand,  if  the  final  remaining  source  is  the  same  as 
the  initial  one,  we  know  the  execution  steps  of  the  target  correspond 
to  zero  source  steps.  Then  for  the  above,  we  can  give  pre-  and 
post-conditions  for  line  3  as  follows: 

{■  •  •  A  aremjS)} 

done  :=  cas(&x,  t,  t+1) 

{■  ■  ■  A  (done  A  arem(skip)  V  -idone  A  arem(S'))} 

As  the  postcondition  shows,  whether  the  cas  instruction  refines  S 
or  not  is  now  conditional  upon  the  value  of  done.  Thanks  to  the 
new  assertions  arem(S),  we  can  reduce  the  relational  and  semantic 
refinement  proofs  to  unary  and  syntactic  Hoare-style  reasoning. 

The  key  to  verifying  the  preservation  of  termination  is  the  rule 
for  while  loops.  One  may  first  think  of  the  total  correctness  rule  for 
while  loops  in  Hoare-style  logics  (e.g.,  [19]).  However,  preserving 
the  termination  does  not  necessarily  mean  that  the  code  must  termi¬ 
nate,  and  the  total  correctness  rule  would  not  be  applicable  in  many 
cases.  For  example,  the  following  T”  and  S'  never  terminate: 


rpit  . 

^  c  ■ 

local  t; 
while  (true)! 
t  :=  x; 

cas(&x,  t,  t+1); 


S'  : 

while  (true) 

X++; 


but  T"  S'  holds  for  our  RGSim-T  —  Every  iteration  of  T'J 
either  corresponds  to  a  step  of  S' ,  or  is  interfered  by  environment 
steps  corresponding  to  source  moves. 

Inspired  by  Hoffmann  et  al.’s  logic  for  lock-freedom  [7],  we 
introduce  a  counter  n  (i.e.  the  number  of  tokens  assigned  to  the 
current  thread)  as  a  while-specific  metric,  which  means  the  thread 
can  only  run  the  loop  for  no  more  than  n  rounds  before  it  or  its 
environment  fulfils  one  or  more  source-level  moves.  The  counter 
is  treated  as  an  auxiliary  state,  and  decreases  at  the  beginning  of 
every  round  of  loop  (i.e.,  we  pay  one  token  for  each  iteration). 
If  we  reach  a  step  in  the  loop  body  that  corresponds  to  source 
moves,  we  could  reset  the  counter  to  increase  the  number  of  tokens. 
Tokens  could  also  increase  under  environment  interference  if  the 
environment  step  corresponds  to  source  moves.  Correspondingly 
our  WHILE  rule  is  in  the  following  form  (we  give  a  simplified 
version  to  demonstrate  the  idea  here.  The  actual  rule  is  given  in 
Sec.  5): 


P  AS  =>  P' *  wf{l)  R\-{P'}C{P} 
R  h  {Pjwhile  (P)  C{P  A  -.5} 


We  use  wf(l)  to  represent  one  token,  and  for  normal  sepa¬ 
rating  conjunction  in  separation  logic.  To  verify  the  loop  body  C, 
we  use  the  precondition  P',  which  has  one  less  token  than  P,  show¬ 
ing  that  one  token  has  been  consumed  to  start  this  new  round  of 
loop.  During  the  execution  of  C,  the  number  of  token  could  be  in¬ 
creased  if  C  itself  or  its  environment  steps  correspond  to  source 
moves.  As  usual,  the  loop  invariant  P  needs  to  be  re-established  at 
the  end  of  C. 


(Event) 

e 

:=  . .  . 

(Label)  i  ::=  e  |  r 

(Store) 

S,  ffi 

e  PVar  Val 

(Heap)  h,  Ih  E  Addr  — ^  Val 

(State) 

CT,  S 

■■=  (s,h) 

(Instr)  c, c  ::=  ... 

(Expr) 

E,V. 

■.=  X  \  n 

1  E  + 

E  1  ... 

(BExp) 

D,B 

:=  true  1 

false  1 

E  =  E  \  \B  1  ... 

(Stmt) 

C,C 

;=  skip  1 

c  1  (C)  1  C'i;C2  1  if  (B)  Cl  else  C2 

1  while 

B)  C 

C1IIC2 

Figure  3.  Generic  language  at  target  and  source  levels. 


To  prove  that  T”  shown  above  preserves  the  termination  of  S', 
we  set  the  initial  number  of  tokens  to  1.  We  use  up  the  token  at 
the  first  iteration,  but  could  gain  another  token  during  the  iteration 
(either  by  self  moves  or  by  environment  steps)  to  pay  for  the  next 
iteration.  We  can  see  that  the  above  reasoning  with  tokens  coincides 
with  the  direct  refinement  proof  in  our  simulation  RGSim-T.  In  fact, 
RGSim-T  can  serve  as  the  meta-theory  of  our  logic. 

The  use  of  tokens  as  an  explicit  metric  for  termination  reason¬ 
ing  poses  another  challenge,  which  is  to  handle  infinite  nondeter¬ 
minism.  Consider  the  following  target  C. 

C:  X  :=  0;  whiled  >  0)  i — ; 

Assume  the  environment  R  may  arbitrarily  update  i  when  x  is  not 
0,  but  does  not  change  anything  when  x  is  0.  We  hope  to  verify  C 
refines  skip.  We  can  see  that  the  loop  in  C  must  terminate  (thus  the 
refinement  holds),  and  the  number  n  of  tokens  must  be  no  less  than 
the  value  of  i  at  the  beginning  of  the  loop.  But  we  cannot  decide 
the  value  of  n  before  executing  x  :  =  0.  This  example  cannot  be 
verified  if  we  have  to  predetermine  and  specify  the  metric  for  the 
while  loops  at  the  very  beginning  of  the  whole  program. 

To  address  this  issue,  we  introduce  the  following  hiding  rule: 

R  ^  {p}C{q} 

[<?]„} 

Here  [pj„  discards  all  the  knowledge  about  tokens  in  p.  For  the 
above  example,  we  can  hide  the  number  of  tokens  after  we  verify 
the  while  loop.  Then  we  do  not  need  to  specify  the  number  of 
tokens  in  the  precondition  of  the  whole  program.  We  formally 
present  the  set  of  logic  rules  in  Sec.  5. 

3.  Formal  Settings  and  Termination-Preserving 
Refinement 

In  this  section,  we  define  the  termination-preserving  refinement  C, 
which  is  the  proof  goal  of  our  RGSim-T  and  logic. 

3.1  The  Language 

Fig.  3  shows  the  programming  language  for  both  the  source  and 
the  target  levels.  We  model  the  program  semantics  as  a  labeled 
transition  system.  A  label  t  that  will  be  associated  with  a  state 
transition  is  either  an  event  e  or  r.  The  latter  marks  a  silent  step 
generating  no  events. 

A  state  (7  is  a  pair  of  a  store  and  a  heap.  The  store  s  is  a  fi¬ 
nite  partial  mapping  from  program  variables  to  values  (e.g.,  inte¬ 
gers  and  memory  addresses)  and  a  heap  h  maps  memory  addresses 
to  values.  Statements  C  are  either  primitive  instructions  or  compo¬ 
sitions  of  them.  A  single-step  execution  of  statements  is  modeled 
as  a  labeled  transition:  {C,  a)  — ^  {C' ,  a').  We  abstract  away  the 
form  of  an  instruction  c.  It  may  generate  an  external  event  (e.g., 
print(i7)  generates  an  output  event).  It  may  be  non-deterministic 
(e.g.,  X  :=  nondet  assigns  a  random  value  to  x).  It  may  also  be 
blocked  at  some  states  (e.g.,  requesting  a  lock).  We  assume  prim¬ 
itive  instructions  are  atomic  in  the  semantics.  We  also  provide  an 
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(C,cr)  — i^+abort  (C,  cr)  -^+ (C',  cr')  ETr{C',o-',£) 

ETr{C,cT,i)  ETr{C,a,e::£) 

(C,ct)  — )•*  (skip,(7')  (C,a)  — >+{C',a')  ETr{C',o-',£) 

ETr{C,  a,  ETr{C,  cr,  £) 


Figure  4.  Co-inductive  definition  of  ETr{C,  o,S). 

atomic  block  (C)  to  execute  a  piece  of  code  C  atomically.  Then 
the  generic  language  in  Fig.  3  is  expressive  enough  for  the  source 
and  the  target  programs  which  may  have  different  granularities  of 
state  accesses.  Due  to  the  space  limit,  the  operational  semantics  and 
more  details  about  the  language  are  formally  presented  in  TR  [13]. 

Conventions.  We  usually  write  blackboard  bold  or  capital  letters 
(ffi,  h,  E,  c,  E,  B  and  C)  for  the  notations  at  the  source  level  to 
distinguish  from  the  target-level  ones  (s,  h,  a,  c,  E,  B  and  C). 

Below  we  use  _  — >■  *  _  for  zero  or  multiple-step  transitions  with 
no  events  generated,  _  — _  for  multiple-step  transitions  without 
events,  and  _  — ^  _  for  multiple-step  transitions  with  only  one 
event  e  generated. 

3.2  Termination-Preserving  Event  Trace  Refinement 

Now  we  formally  define  the  refinement  relation  C  that  relates 
the  observable  event  traces  generated  by  the  source  and  the  target 
programs.  A  trace  f  is  a  finite  or  infinite  sequence  of  external  events 
e,  and  may  end  with  a  termination  marker  JJ.  or  an  abortion  marker 
^  .  It  is  co-inductively  defined  as  follows. 

(EvtTrace)  £  ;:=  fl  |  i  \  e  \  e::£  (co-inductive) 

We  use  ETr{C,a,£)  to  say  that  the  trace  £  is  produced  by 
executing  C  from  the  state  cr.  It  is  co-inductively  defined  in  Fig.  4. 
Here  skip  plays  the  role  of  a  flag  showing  fhe  end  of  execufion  (the 
normal  termination).  Unsafe  executions  lead  to  abort.  We  know 
if  C  diverges  at  a,  then  its  trace  £  is  either  of  infinite  length  or 
finite  but  does  not  end  with  JJ.  or  ^  .  For  instance,  while  (true)  skip 
only  produces  an  empty  trace  e,  and  while  (true)  {print(l)}  only 
produces  an  infinite  trace  of  output  events. 

Then  we  define  a  refinement  (C,  a)  C  (C,  S),  saying  that  ev¬ 
ery  event  trace  generated  by  (C,  a)  at  the  target  level  can  be  repro¬ 
duced  by  (C,  S)  at  the  source.  Since  we  could  distinguish  traces  of 
diverging  executions  from  those  of  terminating  executions,  the  re¬ 
finement  definition  ensures  that  if  (C,  a)  diverges,  so  does  (C,  E). 
Thus  we  know  the  target  preserves  the  termination  of  the  source. 

Definition  1  (Termination-Preserving  Refinement). 

(C,  cr)  C  (C,  E)  iff  \J£.  ETr{C,  a,  £)  ETr{C,  E,  £). 

4.  RGSim-T:  A  Compositional  Simulation  with 
Termination  Preservation 

Below  we  propose  RGSim-T,  a  new  simulation  as  a  compositional 
proof  technique  for  the  above  termination-preserving  refinement. 
As  we  explained  in  Sec.  2,  the  key  to  compositionality  is  to  param¬ 
eterize  the  simulation  with  the  interferences  between  the  programs 
and  their  environments.  In  this  paper,  we  specify  the  interferences 
using  rely/guarantee  conditions  [8],  but  extend  them  to  also  specify 
the  effects  on  the  termination  preservation  of  individual  threads. 

Our  simulation  relation  between  C  and  C  is  in  the  form  of 
R,G,I  1=  {P}C  A  C{Q}.  It  takes  R,  G,  I,  P  and  Q  as  pa¬ 
rameters.  R  and  G  are  rely  and  guarantee  conditions  specifying  the 
interference  between  the  current  thread  and  its  environment.  The 
assertion  I  specifies  the  consistency  relation  between  states  at  the 
target  and  the  source  levels,  which  needs  to  be  preserved  during 
the  execution.  P  specifies  the  pair  of  initial  states  at  the  target  and 


(RelAssn)  P,  Q ,  I B  \  owr\{x)  \  emp  |  E  E  \  E  E 

I  P*Q  I  P^Q  I  W  I 

(FuUAssn)  p,q  ::=  P  \  arem(C)  |  wf(i?)  |  [pja  |  [pj„ 

I  I  pVq  I  ... 

(RelAct)  R,G  ::=  [P]  \  P^Q  \  P  <x  Q  \  R*R  \  R+  \  ... 
Figure  5.  Assertion  language. 


the  source  levels  from  which  the  simulation  holds,  and  Q  is  about 
the  pair  of  final  states  when  the  target  and  the  source  terminate.  So 
before  we  give  our  definition  of  RGSim-T,  we  first  introduce  our 
assertion  language. 

4.1  Assertions  and  New  Rely/Guarantee  Conditions 

We  show  the  syntax  of  the  basic  assertion  language  in  Fig.  5, 
including  the  state  assertions  P  and  Q,  and  our  new  rely/guarantee 
conditions  R  and  G  (let’s  first  ignore  the  assertions  p  and  q,  which 
will  be  explained  in  Sec.  5). 

The  state  assertions  P  and  Q  relate  the  program  states  a  and  E 
at  the  target  and  source  levels.  They  are  separation  logic  assertions 
over  a  pair  of  states.  We  show  their  semantics  in  the  top  part  of 
Fig.  6.  For  simplicity,  we  assume  the  program  variables  used  in 
the  target  code  are  different  from  the  ones  in  the  source  (e.g.,  we 
use  X  and  X  for  target  and  source  level  variables  respectively).  B 
holds  if  it  evaluates  to  true  at  the  disjoint  union  of  the  target  and  the 
source  stores  s  and  a.  We  treat  program  variables  as  resources  [15] 
and  use  own  (a:)  for  the  ownership  of  the  program  variable  x. 
The  assertion  Ei  E2  specifies  a  singleton  heap  of  the  target 
level  with  E2  stored  at  the  address  Ei  and  requires  that  the  stores 
contain  variables  used  to  evaluate  Ei  and  E2.  Its  counterpart  for 
source  level  heaps  is  represented  as  Ei  E2,  whose  semantics  is 
defined  similarly,  emp  describes  empty  stores  and  heaps  at  both 
levels.  Semantics  of  separating  conjunction  P  *  Q  is  similar  as 
in  separation  logic,  except  that  it  is  now  lifted  to  assertions  over 
relational  states  {a,  E).  The  union  of  two  disjoint  relational  states 
(cti,  El)  and  ((J2,  E2)  is  defined  in  fhe  middle  part  of  Fig.  6.  We 
will  define  the  assertion  [[pj|  in  Sec.  5  (see  Fig.  8),  which  ignores 
the  additional  information  other  than  the  relational  states  about  p. 

Our  new  rely/guarantee  assertions  R  and  G  specify  the  transi¬ 
tions  over  the  relational  states  {a,  E)  and  also  the  effects  on  termi¬ 
nation  preservation.  Their  semantics  is  defined  in  the  bottom  part 
of  Fig.  6.  Here  we  use  S  for  the  relational  states.  A  model  con¬ 
sists  of  the  initial  relational  state  S,  the  resulting  state  S',  and  an 
effect  bit  b  to  record  whether  the  target  transitions  correspond  to 
some  source  steps  and  can  affect  the  termination  preservation  of 
the  current  thread  (for  R)  or  other  threads  (for  G). 

We  use  [P]  for  identity  transitions  with  the  relational  states 
satisfying  P.  The  action  P  V.Q  says  that  the  initial  relational  states 
satisfy  P  and  the  resulting  states  satisfy  Q.  For  these  two  kinds 
of  actions,  we  do  not  care  whether  there  is  any  source  step  in  the 
transition  satisfying  them  (the  effect  bit  b  in  their  interpretations 
could  either  be  true  or  false).  We  also  introduce  a  new  action 
P  <x  Q  asserting  that  one  or  more  steps  are  made  at  the  source  level 
(the  effect  bit  b  must  be  true).  Following  LRG  [3],  we  introduce 
separating  conjunction  over  actions  to  locally  reason  about  shared 
state  updates.  Ri  *  R2  means  that  the  actions  Ri  and  R2  start  from 
disjoint  relational  states  and  the  resulting  states  are  also  disjoint. 
But  here  we  also  require  consistency  over  the  effect  bits  for  the  two 
disjoint  state  transitions.  We  use  R'^  for  the  transitive  closure  of 
R,  where  the  effect  bits  in  consecutive  transitions  are  accumulated. 
The  syntactic  sugars  Id,  Emp  and  True  represent  arbitrary  identity 
transitions,  empty  transitions  and  arbitrary  transitions  respectively. 

Since  we  logically  split  states  into  local  and  shared  parts  as  in 
LRG  [3],  we  need  a  precise  invariant  I  to  fence  actions  over  shared 
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({s,h),is,h))\=  B  iff  [Bl.aj*  =  true 

((s,  h),  (ffi,  h))  1=  own(a;)  iff  dom{s  l±l  s)  =  {a;} 

((s,  h),  (ffi,  h))  \=Ei^E2  iff  h  =  {[El]  sWs  [-^2lsl±ls} 

((s,  h),  (ffi,  h))  1=  emp  iff  s  =  h  =  ffi  =  h  =  0 

/l±/2  iff  (dom{h)r\dom{f2)  =  %)  /ia/2  =  /1U/2,  if/i±/2 

{si,hi)±{s2,h2)  iff  (si±S2)  A  (/ii±/i2) 

(si, /il)  l±l  (S2, /i2)  =  (•Sl  u  S2, /ii  U /12) ,  if  (si,  ^i)-L(s2,  ^2) 

(cti,  El)  l±l  (cr2,  S2)  =  (cri  l+l  (72,  El  l+l  S2)  ,  if  cri±(T2  and  Si±S2 

5  ::=  ((7,E) 

{S,S',b)  \=  [P]  iff  {S\=P)A{S  =  S') 

(5, 5',  6)  1=  P  X  Q  iff  (5  1=  P)  A  (5'  1=  Q) 
iS,S',b)  1=  P  a  Q  iff  (5  1=  P)  A  (5'  |=  Q)  A  (fe  =  true) 

(5,5',6)  1=  Pi  *P2  iff  35i,52,5(,5^.  (5  =  5i  i±i52)  A 

(5'=S{lti50A((Si,5;,6)  hPi)  A((52,5',fe)  |=  P2) 

{S,S',b)\=R+  iff  {{S,S',b)  \=  R)  V  {3S",b',b". 

{{S,S",b')  1=  P)  A  ((S",S',b")  1=  R+)  A(b  =  b'Vb")) 

Id  =  [true]  Emp  =  emp  k  emp  True  =  true  k  true 
/  >  P  iff  ([/]  =>  P)  A  (P  =>  /  K  /)  A  Precise)/) 

Sta{P,P)  iff  VS,S',b.  (5|=P)  A  ((S,S',b)l=R)  (S'I=P) 

Figure  6.  Semantics  of  assertions  (part  I). 


states,  which  is  a  state  assertion  like  P  and  Q.  We  define  the  fence 
7  >  P  in  a  similar  way  as  in  our  previous  work  [10]  and  LRG  [3], 
which  says  that  I  precisely  determines  the  boundaries  of  the  states 
of  the  transitions  in  P  (see  Fig.  6).  The  formal  definition  of  the 
precise  requirement  Precise)/)  is  given  in  TR  [13],  which  follows 
its  usual  meaning  as  in  separation  logic  but  is  now  interpreted  over 
relational  states. 

4.2  Definition  of  RGSim-T 

Our  simulation  RGSim-T  is  parameterized  over  the  rely/guarantee 
conditions  P  and  G  to  specify  the  interferences  between  threads 
and  their  environments,  and  a  precise  invariant  I  to  logically  deter¬ 
mine  the  boundaries  of  the  shared  states  and  fence  P  and  G. 

The  simulation  also  takes  a  metric  M,  which  was  referred  to 
as  |r|  in  our  previous  informal  explanations  in  Sec.  2.  We  leave 
its  type  unspecified  here,  which  can  be  instantiated  by  program 
verifiers,  as  long  as  it  is  equipped  with  a  well-founded  order  <. 

The  formal  definition  below  follows  the  intuition  explained  in 
Sec.  2.  Readers  who  are  interested  only  in  the  proof  theory  could 
skip  this  definition,  which  can  be  viewed  as  the  meta-theory  of  our 
program  logic  presented  in  Sec.  4.3  and  Sec.  5. 

Definition  2  (RGSim-T).  P,G,/  |=  {P}GRC{Q}  iff 

for  all  a  and  S,  if  (cr,  E)  |=  P,  then  there  exists  M  such  that 

P,G,7  ^  (G,o-,  M)Rc3(C,S). 

Here  R,G,I  \=  {G,a,M)  (C,  S)  is  the  largest  rela¬ 

tion  such  that  whenever  R,G,  I  \=  (G,  a,  M)  Rq  (C,  E),  then 
(ct,  E)  1=  7  >1=  true  and  the  following  are  true: 

1.  for  any  G' ,  a" ,  ap  and  Ef,  if  (G,  a  W  ap)  — >■  {C ,  a”)  and 

E_LEf,  then  there  exist  <j',  n,  M' ,  b,  C'  and  E'  such  that 

(a)  a"  =  a'  it)  ap, 

(b)  (C,  E  W  Ef)  — >"  (C',  E'  W  Ef), 

(c)  P,G,7^(G',(7',M')^q(C',E'), 

(d)  (((t,  E),  {a' ,  E'),  b)  |=  G^  *  True,  and 

(e)  if  n  =  0,  we  need  M'  <  M  and  b  =  false,  otherwise  b  —  true. 


2.  for  any  e,  C' ,  a" ,  ap  and  ap,  if  (C,ati)ap)  — ^  (C',  a")  and 
E_LEf,  then  there  exist  a',  M',  C'  and  E'  such  that 

(a)  a"  =  a'  it)  ap, 

(b)  (C,EaEF)  (C',E' WEf), 

(c)  P,  G,  7  (G',  a',  M')  <Q  (C',  E'),  and 

(d)  ((cr,  E),  {a' ,  E'),  true)  |=  G^  *  True. 

3.  for  any  b,  a'  and  E',  if  ((cr,  E),  (cr',  E'),  b)  \=  R'^  *  Id,  then 
there  exists  M'  such  that 

(a)  P,  G,  7  ^  (G,  a',  M')  Rq  (C,  E'),  and 

(b)  if  b  =  false,  we  need  M'  =  M. 

4.  if  G  =  skip,  then  for  any  Ef  such  that  E_LEf,  there  exist  n 
and  E'  such  that 

(a)  (C,  E  W  Ef)  — >"  (skip,  E'  W  Ef), 

(b)  (cr,  E')  1=  Q, 

(c)  if  n  >  0,  then  ((cr,  E),  (cr,  E'),  true)  |=  G”*"  *  True. 

5.  for  any  (tf  and  Ef,  if  (G,  crttlcTF)  — abort  and  E_LEf,  then 
(C,  E  tti  Ef  )  — 5-  ^  abort. 


The  simulation  P,  G,  7  |=  (G,  cr,  M)  Rq  (C,  E)  relates  the 
executions  of  the  target  configuration  (G,  a)  (with  its  metric  M) 
to  the  source  (C,  E),  under  the  interferences  with  the  environment 
specified  by  P  and  G.  It  first  requires  that  the  relational  state  (cr,  E) 
satisfy  7  *  true,  7  for  the  shared  part  and  true  for  the  local  part, 
establishing  a  consistency  relation  between  the  states  at  the  two 
levels.  For  every  silent  step  of  (G,  a)  (condition  1,  let’s  first  ignore 
the  frame  states  ap  and  Ef  which  will  be  discussed  later),  the 
source  could  make  n  steps  (n  >  0)  correspondingly  (1(b)),  and  the 
simulation  is  preserved  afterwards  with  a  new  metric  M'  (1(c)). 
Here  we  use  _  — _  to  represent  n-step  silent  transitions.  If 
n  =  0  in  1(b)  (i.e.,  the  source  does  not  move),  the  metric  must 
decrease  along  the  associated  well-founded  order  (M'  <  M  in 
1(e)),  otherwise  we  do  not  have  any  restrictions  over  M' .  We  also 
require  that  the  related  steps  at  the  two  levels  satisfy  the  guarantee 
condition  G^  >i<True  (1(d)),  the  transitive  closure  G"^  for  the  shared 
part  and  True  for  the  private.  If  the  target  step  corresponds  to  no 
source  moves  (n  =  0),  we  use  false  as  the  corresponding  effect  bit, 
otherwise  the  bit  should  be  true  (1(e)). 

If  a  target  step  produces  an  event  e,  the  requirements  (condition 
2)  are  similar  to  those  in  condition  1,  except  that  we  know  for 
sure  that  target  step  corresponds  to  one  or  more  source  steps  that 
produce  the  same  e. 

The  simulation  should  be  preserved  after  environment  transi¬ 
tions  satisfying  P*^  *  Id,  P^  for  the  shared  part  and  Id  for  the 
private  (condition  3).  If  the  corresponding  effect  bit  of  the  envi¬ 
ronment  transition  is  true,  we  know  there  are  one  or  more  source 
moves,  therefore  there  are  no  restrictions  over  the  metric  M'  for  the 
resulting  code  (which  could  be  larger  than  M).  Otherwise,  the  met¬ 
ric  should  be  unaffected  under  the  environment  interference  (i.e., 
M'  =  M  in  3(b)). 

If  G  terminates  (condition  4),  the  corresponding  C  must  also 
terminate  and  the  resulting  states  satisfy  the  postcondition  Q.  Fi¬ 
nally,  if  G  is  unsafe,  then  C  must  be  unsafe  too  (condition  5). 

Inspired  by  Vafeiadis  [24],  we  directly  embed  the  framing  as¬ 
pect  of  separation  logic  in  Def.  2.  At  each  condition,  we  introduce 
the  frame  states  ap  and  Ef  at  the  target  and  source  levels  to  repre¬ 
sent  the  remaining  parts  of  the  states  owned  by  other  threads  in  the 
system.  The  commands  G  and  C  must  not  change  the  frame  states 
during  their  executions  (see,  e.g.,  conditions  1(a)  and  1(b)).  These 
ap  and  Ef  quantifications  in  RGSim-T  are  crucial  to  admit  the 
parallel  compositionality  and  the  frame  rules  (the  B-FRAME  rule  in 
Fig.  7  and  the  FRAME  rule  in  Fig.  9). 

We  then  define  P,  G,  7  |=  {P}G  A  C{Q}  by  hiding  the  initial 
states  via  the  precondition  P  and  hiding  the  metric  M. 
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i?VG2,Gi,/l-{Pi*P}Ci^Ci{Qi*Q'J  PVGi,G2,/I-{P2*P}G2^C2{Q2*Q^}  PVQ'^WQ'^^I  IoR 
P,  Gi  V  G2,  /  h  {Pi  *  P2  *  P}Gi  II  G2  :<Ci  II  C2{Qi  *  Q2  *  (Q'l  A  Q2)} 


(b-par) 


R,G,I  h  {P  AB}Cd:C{P} 
R,G,I\-  {P}while  (P)  G Awhile  (B)  C{P  A  -.P} 


(b-while) 


p^(p  =  E)*/  Sta(P,  P*ld)  /t>{P,G} 
R,  G,  /  h  {P}print(P)  ^print(E){P} 


(b-prt) 


R,G,I  h  {P}G<C{Q}  Sta(P',R' *  Id)  7'>{P',G'}  P' ^  P  *  true 
P*P',G*G',/*P  h  {P*P'}G^C{Q*P'} 


G+  ^  G 

-  (b-frame) 


Figure  7.  Selected  binary  inference  rules  (compositionality  of  RGSim-T). 


Adequacy.  RGSim-T  ensures  the  termination-preserving  refine¬ 
ment  by  using  the  metric  with  a  well-founded  order.  The  proof  of 
the  following  adequacy  theorem  is  in  TR  [13]. 

Theorem  3  (Adequacy  of  RGSim-T).  If  there  exist  R,  G,  I,  Q 
and  a  metric  M  (with  a  well-founded  order  <)  such  that  R,G,I  \= 
{G,  a,  M)  r<Q  (C,  S),  then  (C,  a)  G  (C,  E). 

4.3  Compositionality  Rules 

RGSim-T  is  compositional.  We  show  some  of  the  compositionality 
rules  in  Fig.  7.  Here  we  use  R,G,I  h  {P}G  ^  C{(5}  for  the 
judgment  to  emphasize  syntactic  reasoning,  whose  semantics  is 
RGSim-T  (Def.  2).  The  rules  can  be  viewed  as  the  binary  version 
of  those  in  a  traditional  rely-guarantee-style  logic  (e.g.,  LRG  [3] 
and  RGSep  [23]). 

The  B-PAR  rule  shows  the  compositionality  w.r.t.  parallel  com¬ 
positions.  To  verify  Gi  ||  G2  is  a  refinement  of  Ci  ||  C2,  we  ver¬ 
ify  the  refinement  of  each  thread  separately.  The  rely  condition  of 
each  thread  captures  the  interference  from  both  the  overall  envi¬ 
ronment  (R)  and  its  sibling  thread  (Gi  or  G2).  The  related  steps  of 
Gi  II  G2  and  Ci  ||  C2  should  satisfy  either  thread’s  guarantee.  As 
in  LRG  [3],  Pi  and  P2  specify  the  private  (relational)  states  of  the 
threads  Gi/Ci  and  G2/C2  respectively.  The  states  P  are  shared  by 
them.  When  both  threads  have  terminated,  their  private  states  sat¬ 
isfy  Qi  and  <52,  and  the  shared  states  satisfy  both  Qi  and  Q2.  We 
require  that  the  shared  states  are  well-formed  (P,  Q'l  and  Q2  imply 
I)  and  the  overall  environment  transitions  are  fenced  (/  >  R). 

The  B-WHILE  rule  requires  the  boolean  conditions  of  both  sides 
to  be  evaluated  to  the  same  value.  The  resources  needed  to  evaluate 
them  should  be  available  in  the  private  part  of  P.  The  B-FRAME 
rule  supports  local  reasoning.  The  frame  P'  may  contain  shared 
and  private  parts,  so  it  should  be  stable  w.r.t.  P'  >1  Id  and  imply 
P  *  true,  where  I'  is  the  fence  for  R'  and  G'  (see  Fig.  6  for  the 
definitions  of  fences  and  stability).  We  also  require  G  to  be  closed 
over  transitivity.  This  rule  is  almost  identical  to  the  one  in  LRG  [3]. 
Details  are  elided  here. 

We  provide  a  few  binary  rules  to  reason  about  the  basic  program 
units  when  they  are  almost  identical  at  both  sides.  For  instance, 
the  B-PRT  rule  relates  a  target  print  command  to  a  source  one, 
requiring  that  they  always  print  out  the  same  value.  For  more 
general  refinement  units,  as  we  explained  in  Sec.  2,  we  reduce 
relational  verification  to  unary  reasoning  (using  the  u2b  rule  in 
Fig.  9,  which  we  will  explain  in  the  next  section).  Our  TR  [13] 
contains  more  rules  and  the  full  soundness  proofs.  The  soundness 
theorem  is  shown  below. 

Theorem  4  (Soundness  of  Binary  Rules). 

If  R,  G,  I\-  {P}G  <  C{Q},  then  P,  G,  7 1=  {P}G  ^  C{Q}. 

5.  A  Rely-Guarantee-Style  Logic  for 
Termination-Preserving  Refinement 

The  binary  inference  rules  in  Fig.  7  allow  us  to  decompose  the 
refinement  verification  of  large  programs  into  the  refinement  units’ 


w  G  Nat  ID  ::=  C  |  • 

((7,a),D,  S)  1=  P  E)^P 

((7,a),D,  S)  1=  arem(C')  iff  B  =  C' 

((s,  h),  w,  B,  S)  ^  wf(P)  iff  3n.  ([Els  =  n)  A  {n  <  w) 

{a,  w,  D,  S)  1=  [pja  iff  3D',  {cr,  a),  D',  S)  ^  p 

((7,  ui,D,  S)  1=  [pjw  iff  3ui'.  ((7,  to',  B,  S)  1=  p 

((7,  S)^|ipJ]  iff  3to,D.  (cr,to,B,  S)  |=p 

Bi±B2  iff  (Bi  =  .)  V  (B2  =  .)  Bi  ttiBa  =  |  ^  * 

(ti  ,  IL’I ,  Di ,  Si)  l±l  (cr2,  1D)2,  S2)  = 

((7 1 1±)(72 ,  toi  -l-t02 ,  Bi  1±IB2,  El  l±lE2)  ,  if  c7i_L(72  ,  Bi_LB2  and  Ei_LE2 


Sta(p,P)  iff  Vo-,t«,B,  E,(7',E',fe. 

({a,t«,B,E)  |=p)A(((o-,  E),((7',E'),6)  |=  P) 

=>  3w' .  (ct',  to',  B,  E')  1=  p  A  (b  =  false  =►  to'  =  to) 

p  q  iff  p  =>  q 

p  ^3-  q  iff  Vcr,  to,  B,  E,  Gp.  D,  E)  |=  p)  A  (EJ-Ei?)  =i- 

3to',C',E'.  (B,  EI+IEf)  — 1+  (C',E'i±IEf)  A  (((7,  to',  C',  E')  |=  q) 

Figure  8.  Semantics  of  assertions  (part  II). 

verification.  In  this  section,  we  explain  the  unary  rules  for  verifying 
refinement  units.  All  the  binary  and  unary  rules  constitute  our  novel 
rely-guarantee-style  logic  for  modular  verification  of  termination¬ 
preserving  refinement. 

5.1  Assertions  on  Source  Code  and  Number  of  Tokens 

We  first  explain  the  new  assertions  p  and  q  used  in  the  unary  rules 
that  can  specify  the  source  code  and  metrics  in  addition  to  states. 
We  define  their  syntax  in  Fig.  5,  and  their  semantics  in  Fig.  8.  A 
full  state  assertion  p  is  interpreted  on  (a,  ui,D,  E).  Here  besides 
the  states  a  and  E  at  the  target  and  source  levels,  we  introduce 
some  auxiliary  data  w  and  D.  w  is  the  number  of  tokens  needed  for 
loops  (see  Sec.  2).  D  is  either  some  source  code  C,  or  a  special  sign 
•  serving  as  a  unit  for  defining  semantics  ofp*q  below. 

In  Fig.  8  we  lift  the  relational  assertion  P  as  a  full  state  assertion 
to  specify  the  states.  The  new  assertion  arem(C)  says  that  the 
remaining  source  code  is  C  at  the  current  program  point.  wf(P) 
states  that  the  number  of  tokens  at  the  current  target  code  is  no  less 
than  E.  We  can  see  wf(0)  always  holds,  and  for  any  n,  wf(n  -f  1) 
implies  wf(n).  We  use  [pja  and  [pj„  to  ignore  the  descriptions  in 
p  about  the  source  code  and  the  number  of  tokens  respectively,  [[pjj 
lifts  p  back  to  a  relational  state  assertion. 

Separating  conjunction  p  *  q  has  the  standard  meaning  as  in 
separation  logic,  which  says  p  and  q  hold  over  disjoint  parts  of 
(a,  w,  D,  E)  respectively  (the  formal  definition  elided  here).  How¬ 
ever,  it  is  worth  noting  the  definition  of  disjoint  union  over  the 
quadruple  states,  which  is  shown  in  the  middle  part  of  Fig.  8.  The 
disjoint  union  of  the  numbers  of  tokens  wi  and  W2  is  simply  the 
sum  of  them.  The  disjoint  union  of  Di  and  D2  is  defined  only  if 


349 


R,G,I\-{PA  arem(C)}C{Q  A  arem(skip)} 
R,G,I\-  {P}C^C{Q} 


l-SL  [p]C[q] 

(ILpJI  ^  II9JI)  =>  G  *  True  I>G  pV  q  ^  I  *  true 
[/],G,/h{p}(C){g} 


(atom) 


p  p'  hsL  [p']C[q']  q'^^q  +  G  {a,  6} 

(ILpU  “  lI'jJI)  =►  G  *  True  I  \>  G  pV  q  ^  I  *  true 


[/],G,7h{p}(G)M 


(ATOM+) 


[/],G,/h{p}(C)M  Sta({p,q},R*ld)  I>R 
R,G,Ih{p}{G){q} 


(atom-r) 


p  =>  =  B)  *  I 

p  A  B  p'  *  (wf(l)  A  emp)  R,G,I\-  {p'}G{p} 
R,G,I\-  {pjwhile  (B)  C{p  A  -.5} 


(while) 


i;,G,Jh{p}G{9} 

R,G,/h{LpJ„}G{L<7jw} 


(hide-w) 


R,G,I  \- {p}G{q}  Sta(p',H' *  Id)  I'\>{R',G'}  p' =>  7' *  true  G+ =>  G 
R*R',G*G',I  *r  \-  {p*  p'}G{q  *  p'} 


(frame) 


Figure  9.  Selected  unary  inference  rules. 
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local  t; 

{  X  =  X  A  arem(5^)  A  wf(l)  } 
while  (true)  { 

{  X  =  X  A  arem(5')  } 

<  t  :=  x;  > 
r  X  =  X  =  t  A  arem(5^)  V 
I  X  =  X  /  t  A  arem^S'^)  A  wf{l) 
cas(&x,  t,  t+1) ; 

{  X  =  X  A  arem(5')  A  wf(l)  } 

} 

(a)  looping  a  counter:  I  = 


//  unfolding  cas 

<  if  (x  =  t) 

I X  =  X  =  t  A  arem(S'^)  } 

I  X  =  X  =  t  A  arem(X++;  } 

X  :=  t  +  1; 

|x  =  X  =  t  +  lA  arem(5^)  A  wf(l)  } 


(x  =  X)  R  =  G  =  (/  oc  /)  V  [I] 


Figure  10.  Proofs  for  two  small  examples. 


1  local  i  :=  100; 

{  i  >  0  A  wf(i)  A  arem(skip)  } 

2  while  (i  >  0)  { 

{  i  >  0  A  wf(i-l)  A  arem(skip)  } 

3  i — ; 

{  i  >  0  A  wf(i)  A  arem(skip)  } 

4  } 

(b)  local  termination: 

7  =  emp  7J  =  G  =  Emp 


at  least  one  of  them  is  the  special  sign  •,  which  has  no  knowledge 
about  the  remaining  source  code  C.  Therefore  we  know  the  follow¬ 
ing  holds  (for  any  P  and  C): 

(P  A  arem(C)  A  wf(l))  *  (wf(l)  A  emp)  (P  A  arem(C)  A  wf(2)) 

One  may  think  a  more  natural  definition  of  the  disjoint  union  is 
to  require  the  two  Ds  be  the  same.  But  this  would  break  the  FRAME 
rule  (see  Fig.  9).  For  example,  we  can  prove: 

Emp,  Emp,  emp  h  {x  =  X  A  arem(X++)}  x++  {x  =  X  A  arem(skip)} 

With  the  FRAME  rule  and  the  separating  conjunction  based  on 
the  alternative  definition  of  disjoint  union,  we  would  derive  the 
following: 

Emp,  Emp,  emp  h  {(x  =  X  A  arem(X++))  *  arem(X++)} 

X++  {(x  =  X  A  arem(skip))  *  arem(X++)} 

which  is  reduced  to  an  invalid  judgment: 

Emp,  Emp,  emp  h  {x  =  X  A  arem(X++)}  x++  {false} 

We  require  in  p  *  q  that  either  p  or  q  should  not  specify  the  source 
code,  therefore  in  this  example  the  precondition  after  applying  the 
frame  rule  is  invalid  (thus  the  whole  judgment  is  valid). 

The  stability  of  p  w.r.t.  an  action  R,  defined  at  the  bottom  part  of 
Fig.  8,  specifies  how  the  number  of  tokens  of  a  program  (specified 
by  p)  could  change  under  7?’s  interferences.  As  a  simple  example, 
for  the  following  p,  Ri  and  R2,  Sta(p,  7?i)  holds  while  Sta(p,  R2) 
does  not  hold: 

p  =  (10  hA  0  *  20  0)  V  ((10  1-1  1  *  20  0)  A  wf(l)) 

Til  =  (10  i-t  0  *  20  0)  a  (10  1-1  1  *  20  0) 

R2  =  (10  i-l  0  *  20  0)  K  (10  1-1  1  *  20  0) 

5.2  Unary  Inference  Rules 

The  judgment  for  unary  reasoning  is  in  the  form  of  R,G,I  h 
{p}C{q}.  We  present  some  of  the  rules  in  Fig.  9. 


The  u2b  rule,  as  explained  in  Sec.  2,  turns  unary  proofs  to 
binary  ones.  It  says  that  if  the  remaining  source  code  is  C  at  the 
beginning  of  the  target  C,  and  it  becomes  skip  at  the  end  of  C, 
then  we  know  C  is  simulated  by  C. 

The  ATOM  rule  allows  us  to  reason  sequentially  about  the  target 
code  in  the  atomic  block.  We  use  Fsl  [p]C'[q]  to  represent  the  total 
correctness  of  C  in  sequential  separation  logic.  The  corresponding 
rules  are  mostly  standard  and  elided  here.  Note  that  C  only  accesses 
the  target  state  a,  therefore  in  our  sequential  rules  we  require 
the  source  state  E  and  the  auxiliary  data  ui  and  D  in  p  should 
remain  unchanged  in  q.  We  can  lift  C’s  total  correctness  to  the 
concurrent  setting  as  long  as  its  overall  transition  over  the  shared 
states  satisfies  the  guarantee  G.  Here  we  assume  the  environment 
is  identity  transitions.  To  allow  general  environment  behaviors,  we 
can  apply  the  ATOM-R  rule  later,  which  requires  that  R  be  fenced 
by  I  and  the  pre-  and  post-conditions  be  stable  w.r.t.  R. 

The  ATOM^  rule  is  similar  to  the  ATOM  rule,  except  that  it 
executes  the  source  code  simultaneously  with  the  target  atomic 
step.  We  use  p  q  for  the  multi-step  executions  from  the  source 
code  specified  by  p  to  the  code  specified  by  q,  which  is  defined 
in  the  bottom  part  of  Fig.  8.  We  also  write  p  q  for  the  usual 
implication  p  =>  q.  Then,  the  ATOM"^  rule  says,  we  can  execute  the 
source  code  before  or  after  the  steps  of  C,  as  long  as  the  overall 
transition  (including  the  source  steps  and  the  target  steps)  with  the 
effect  bit  true  satisfies  G  for  the  shared  parts. 

The  WHILE  rule  is  the  key  to  proving  the  preservation  of  termi¬ 
nation.  As  we  informally  explained  in  Sec.  2,  we  should  be  able  to 
decrease  the  number  of  tokens  at  the  beginning  of  each  loop  itera¬ 
tion.  And  we  should  re-establish  the  invariant  p  between  the  states 
and  the  number  of  tokens  at  the  end  of  each  iteration.  Below  we 
give  two  examples,  each  of  which  shows  a  typical  application  of 
the  WHILE  rule. 
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Examples.  The  first  example  is  the  T”  and  S'  in  Sec.  2.  We  show 
its  proof  in  our  logic  in  Fig.  10(a)  (for  simplicity,  below  we  always 
assume  the  ownership  of  variables).  We  use  X  for  the  counter  at 
the  source,  and  the  rely/guarantee  conditions  say  that  the  counters 
at  the  two  levels  can  be  updated  simultaneously  with  the  effect  bit 
true.  The  loop  invariant  above  line  2  says  that  we  should  have  at 
least  one  token  to  execute  the  loop.  The  loop  body  is  verified  with 
zero  tokens,  and  should  finally  restore  the  invariant  token  number 
1.  The  gaining  of  the  token  may  be  due  to  a  successful  cas  at  line  4 
that  corresponds  to  source  steps,  or  caused  by  the  environment 
interferences.  More  specifically,  the  assertion  following  line  3  says 
that  we  can  gain  a  token  if  the  counters  have  been  updated.  If  the 
counters  are  not  updated  before  the  cas  at  line  4,  the  cas  succeeds 
and  we  show  the  detailed  proof  at  the  right  part  of  Fig.  10(a),  in 
which  we  execute  one  iteration  of  the  source  code  and  gain  a  token 
(applying  the  ATONT^  rule). 

This  example  shows  the  most  straightforward  understanding  of 
the  WHILE  rule:  we  pay  a  token  at  the  beginning  of  an  iteration 
and  should  be  able  to  gain  another  token  during  the  execution  of 
the  iteration.  The  next  example  is  more  subtle  (though  simpler). 
As  shown  in  Fig.  10(b),  it  is  a  locally-terminating  while  loop  (i.e., 
a  loop  that  terminates  regardless  of  environment  interferences). 
We  prove  it  refines  skip  under  the  environment  Emp.  The  loop 
invariant  above  line  2  says  that  the  number  of  tokens  equals  the 
value  of  i.  If  the  loop  condition  (i>0)  is  satisfied,  we  pay  one 
token.  In  the  proof  of  the  loop  body,  we  do  not  (and  are  not  able 
to)  gain  more  tokens.  Instead,  the  value  of  i  will  be  decreased  in 
the  iteration,  enabling  us  to  restore  the  equality  between  the  number 
of  tokens  and  i. 

Other  rules  and  discussions.  Another  important  rule  is  the 
HIDE-W  rule  in  Fig.  9.  It  shows  that  tokens  are  just  an  auxiliary 
tool,  which  could  be  safely  discarded  (by  using  [_Jw)  when  the 
termination-preservation  of  a  command  C  (say,  a  while  loop)  is 
already  established.  As  we  mentioned  in  Sec.  2,  the  HIDE-W  rule 
is  crucial  to  handle  infinite  nondeterminism.  It  is  also  important  for 
local  reasoning,  so  that  when  we  verify  a  thread,  we  do  not  have 
to  calculate  and  specify  in  the  precondition  the  number  of  tokens 
needed  by  all  the  while  loops.  For  nested  loops,  we  could  use  the 
HIDE-W  rule  to  hide  the  tokens  needed  by  the  inner  loop,  and  use 
the  FRAME  rule  to  add  back  the  tokens  needed  for  the  outer  loop 
later  when  we  compose  the  inner  loop  with  other  parts  of  the  outer 
loop  body. 

The  unary  FRAME  rule  in  Fig.  9  is  similar  to  the  binary  one  in 
Fig.  7.  Other  rules  can  be  found  in  our  TR  [13],  which  are  very 
similar  to  those  in  LRG  [3],  but  we  give  different  interpretations  to 
assertions  and  actions. 

The  binary  rules  (in  Fig.  7)  and  the  unary  rules  (in  Fig.  9)  gives 
us  a  full  proof  theory  for  termination-preserving  refinement.  We 
want  to  remind  the  readers  that  the  logic  does  not  ensure  termina¬ 
tion  of  programs,  therefore  it  is  not  a  logic  for  total  correctness.  On 
the  other  hand,  if  we  restrict  the  source  code  to  skip  (which  always 
terminates),  then  our  unary  rules  can  be  viewed  as  a  proof  theory 
for  the  total  correctness  of  concurrent  programs. 

Also  note  that  the  use  of  a  natural  number  w  as  the  while- 
specific  metric  is  to  simplify  the  presentation  only.  It  is  easy  to 
extend  our  work  to  support  other  types  of  the  while-specific  metrics 
for  more  complicated  examples. 


6.  More  Examples 

We  have  seen  a  few  small  examples  that  illustrate  the  use  of  our 
logic,  in  particular,  the  WHILE  rule.  In  this  section,  we  discuss  other 
examples  that  we  have  proved,  which  are  summarized  in  Fig.  11. 
Their  proofs  are  in  TR  [13]. 


Linearizability  &  Lock-Freedom 

Counter  and  its  variants 

Treiber  stack  [20] 

Michael-Scott  lock-free  queue  [14J 

DGLM  lock-free  queue  [2] 

Non-Atomic  Object  Correctness 

Synchronous  queue  [16] 

Correctness  of  Optimized  Algo 
(Equivalence) 

Counter  vs.  its  variants 

TAS  lock  vs.  TTAS  lock  [6] 

Figure  11.  Verified  examples  using  our  logic. 


Proving  linearizability  and  lock-freedom  together  for  concurrent 
objects.  It  has  been  shown  [12]  that  the  verification  of  lineariz¬ 
ability  and  lock-freedom  together  can  be  reduced  to  verifying  a 
contextual  refinement  that  preserves  the  termination  of  any  client 
programs.  That  is,  for  any  client  as  the  context  the  termination¬ 
preserving  refinement  ^[C]  [2  ^[C]  should  hold.  Here  we  use  C 
for  the  concrete  implementation  of  the  object,  and  C  for  the  corre¬ 
sponding  abstract  atomic  operations.  '^[C]  (or  "^[C])  denotes  the 
whole  program  where  the  client  accesses  the  object  via  method 
calls  to  C  (or  C). 

The  compositionality  rules  of  our  logic  (Fig.  7)  allow  us  to  ver¬ 
ify  the  above  contextual  refinement  by  proving  R,G,I  \-  {P}C  A 
C{Q}.  Then  we  apply  the  u2b  rule  and  turn  the  relational  ver¬ 
ification  to  unary  reasoning.  As  in  a  normal  linearizability  proof 
(e.g.,  [10,  23]),  we  need  to  find  a  single  step  of  C  (i.e.,  the  lin¬ 
earization  point)  that  corresponds  to  the  atomic  step  of  C.  Here  we 
also  have  to  prove  lock-freedom:  the  failure  to  make  progress  (i.e., 
finish  an  abstract  operation)  of  a  thread  must  be  caused  by  success¬ 
ful  progress  of  its  environment,  which  can  be  ensured  by  the  WHILE 
rule  (in  Fig.  9)  in  our  logic. 

We  have  used  the  above  approach  to  verify  several  lineariz- 
able  and  lock-free  objects,  including  Treiber  stack  [20],  Michael- 
Scott  lock-free  queue  [14]  and  DGLM  queue  [2].  We  can  further 
extend  the  logic  in  this  paper  with  the  techniques  [10]  for  verify¬ 
ing  linearizability  of  algorithms  with  non-fixed  linearization  points, 
to  support  more  sophisticated  examples  such  as  HSY  elimination- 
based  stack  and  Harris-Michael  lock-free  list. 

Verifying  concurrent  objects  whose  abstract  operations  are  not 
atomic.  Sometimes  we  cannot  define  single  atomic  operations  as 
the  abstract  specification  of  a  concurrent  object.  For  objects  that 
implement  synchronization  between  threads,  we  may  have  to  ex¬ 
plicitly  take  into  account  the  interferences  from  other  threads  when 
defining  the  abstract  behaviors  of  the  current  thread.  For  exam¬ 
ple,  the  synchronous  queue  [16]  is  a  concurrent  transfer  channel  in 
which  each  producer  presenting  an  item  must  wait  for  a  consumer 
to  take  this  item,  and  vice  versa.  The  corresponding  abstract  opera¬ 
tions  are  no  longer  atomic.  We  used  our  logic  to  prove  the  contex¬ 
tual  refinement  between  the  concrete  implementation  (from  [16], 
used  in  Java  6)  and  a  more  abstract  synchronous  queue.  The  refine¬ 
ment  ensures  that  if  a  producer  (or  a  consumer)  is  blocked  at  the 
concrete  level,  it  must  also  be  blocked  at  the  source  level. 

Proving  equivalence  between  optimized  algorithms  and  original 
ones.  We  also  use  our  logic  to  show  variants  of  concurrent  algo¬ 
rithms  are  correct  optimizations  of  the  original  implementations. 
In  this  case,  we  show  equivalence  (in  fact,  contextual  equivalence), 
i.e.,  refinements  of  both  directions. 

For  instance,  we  proved  the  TTAS  lock  implementation  is 
equivalent  to  the  TAS  lock  implementation  [6]  for  any  client  using 
the  locks.  The  former  tests  the  lock  bit  in  a  nested  while  loop  until  it 
appears  to  be  free,  and  then  uses  the  atomic  getAndSet  instruction 
to  update  the  bit;  while  the  latter  directly  tries  getAndSet  until 
success.  The  equivalence  result  between  these  two  lock  implemen¬ 
tations  shows  that  no  client  may  observe  their  differences,  includ¬ 
ing  the  differences  on  their  termination  behaviors  (e.g.,  whether  a 
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client  thread  may  acquire  the  lock).  It  gives  us  the  full  correctness 
of  the  TTAS  lock.  As  an  optimization  of  TAS  lock,  it  preserves  the 
behaviors  on  both  functionality  and  termination  of  the  latter. 

7.  Related  Work  and  Conclusion 

Hoffmann  et  al.  [7]  propose  a  program  logic  to  verify  lock-freedom 
of  concurrent  objects.  They  reason  about  termination  quantitatively 
by  introducing  tokens,  and  model  the  environment’s  interference 
over  the  current  thread’s  termination  in  terms  of  token  transfer.  The 
idea  is  simple  and  natural,  but  their  logic  has  very  limited  support 
of  local  reasoning.  One  needs  to  know  the  total  number  of  tokens 
needed  by  each  thread  (which  may  have  multiple  while  loops)  and 
the  (fixed)  number  of  threads,  to  calculate  the  number  of  tokens  for 
a  thread  to  lose  or  initially  own.  This  requirement  also  disallows 
their  logic  to  reason  about  programs  with  infinite  nondeterminism. 
Here  we  allow  a  thread  to  set  its  effect  bit  in  RjG  without  knowing 
the  details  of  other  threads;  and  other  threads  can  determine  by 
themselves  how  many  tokens  they  gain.  We  also  introduce  the 
HIDE-W  rule  to  hide  the  number  of  tokens  and  to  support  infinite 
nondeterminism.  Another  key  difference  is  that  our  logic  supports 
verification  of  refinement,  which  is  not  supported  by  their  logic. 

Gotsman  et  al.  [5]  propose  program  logic  and  tools  to  verify 
lock-freedom.  Their  approach  is  more  heavyweight  in  that  they 
need  temporal  assertions  in  the  rely/guarantee  conditions  to  spec¬ 
ify  interference  between  threads,  and  the  rely/guarantee  conditions 
need  to  be  specified  iteratively  in  multiple  rounds  to  break  circu¬ 
lar  reliance  on  progress.  Moreover,  their  work  relies  on  third-party 
tools  to  check  termination  of  individual  threads  as  closed  sequential 
programs.  Therefore  they  do  not  have  a  set  of  self-contained  pro¬ 
gram  logic  rules  and  a  coherent  meta-theory  as  we  do.  Like  Hoff¬ 
mann  et  al.  [7],  they  do  not  support  refinement  verification  either. 

As  we  explained  in  Sec.  1,  none  of  recent  work  on  general 
refinement  verification  of  concurrent  programs  [11,  21,  22]  and 
on  verifying  linearizability  of  concurrent  objects  [10,  23]  (which 
can  be  viewed  as  a  specialized  refinement  problem)  preserves 
termination.  Sevcfk  et  al.  equipped  their  simulation  proofs  for 
CompCertTSO  [17]  with  a  well-founded  order,  following  the 
CompCert  approach.  Their  approach  is  similar  to  our  second  at¬ 
tempt  explained  in  Sec.  2,  thus  cannot  be  applied  to  prove  lock- 
freedom  of  concurrent  objects. 

Conclusion  and  future  work.  We  propose  a  new  compositional 
simulation  RGSim-T  to  verify  termination-preserving  refinement 
between  concurrent  programs.  We  also  give  a  rely/guarantee  pro¬ 
gram  logic  as  a  proof  theory  for  the  simulation.  Our  logic  is  the  first 
to  support  compositional  verification  of  termination-preserving  re¬ 
finement.  The  simulation  and  logic  are  general.  They  can  be  used 
to  verify  both  correctness  of  optimizations  (where  the  source  may 
not  necessarily  terminate)  and  lock-freedom  of  concurrent  objects. 
As  future  work,  we  would  like  to  further  extend  them  with  the  tech¬ 
niques  of  pending  thread  pools  and  speculations  [10]  to  verify  ob¬ 
jects  with  non-fixed  linearization  points.  We  also  hope  to  explore 
the  possibility  of  building  tools  to  automate  the  verification. 
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NOTES:  This  TR  is  a  supplement  to  our  CSL-LICST4  paper.  It  includes  full  formulations  of  the 
technical  settings  (Section [^,  our  RGSim-T  definitions  (Section]^,  the  full  program  logic  (Section]^,  all 
the  examples  we  have  verified  (Section]^  and  the  full  formal  soundness  proofs  (Section]^. 

Moreover,  we  introduce  a  new  interesting  assertion  p  (S>  q  which  allows  local  reasoning  about  the 
number  of  tokens  that  is  conditional  upon  the  shared  state  in  runtime.  See  Section  for  its  semantics, 
Section for  the  related  local  reasoning  rule  and  Section]^ for  its  use  in  practical  examples. 

We  also  provide  a  transitivity  rule  on  the  binary  judgments.  We  introduce  new  assertions  to  specify 
the  compositions  of  two  relational  assertions  and  of  two  actions  (see  Section]^. 

For  more  informal  explanations  and  the  high-level  picture,  please  see  our  CSL-LICS’14  paper.  Both 
the  paper  and  this  companion  TR  can  be  found  at  the  following  url: 

http : / /kyhcs . ustcsz . edu . cn/relconcur/rgsimt 
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1  Basic  Technical  Settings  and  Termination- Preserving  Refine¬ 
ment 

1.1  The  Language 

We  show  the  language  in  Figurej^  We  assume  the  program  variables  used  in  the  target  code  are  different 
from  the  ones  used  in  the  source  (e.g.,  we  use  x  and  X  for  target  and  source  level  variables  respectively). 


{Event) 

e 

(Label)  l  :;=  e  |  r 

(Store) 

€  PVar - 

Val  (Heap)  /i,  h  G  Addr  Val 

(State) 

a,Y^  :;=  (s,h) 

(Instr) 

c,  c  £  State  - 

t^((Label  x  State)  U  {abort}) 

(Expr) 

J5,E  ::=  x  \  n 

1  E  +  E  1  ... 

(BExp) 

B,B  ::=  true 

false  \  E  =  E  \  \B  \  ... 

(Stmt) 

C,  C  ::=  skip  | 

c  1  (C)  1  Ci;C2  1  if  (B)  Cl  else  C2 

I  while  (B)  C  I  Cl  II  C2 


Figure  1:  Generic  language  at  target  and  source  levels. 

We  show  the  operational  semantics  in  Figure  The  semantics  of  E  and  B  are  defined  by  |i?] 
and  |B]  respectively.  |£1]  is  a  partial  function  of  type  Store  Val.  |i3]  is  a  partial  function  of  type 
Store  {true,  false}.  They  are  undefined  if  variables  in  E  and  B  are  not  assigned  values  in  the  store 
s.  Their  definitions  are  omitted  here. 

Conventions.  We  usually  write  blackboard  bold  or  capital  letters  (s,  h,  S,  c,  E,  B  and  C)  for  the 
notations  at  the  source  level  to  distinguish  from  the  target-level  ones  (s,  h,  a,  c,  E,  B  and  C).  When  we 
discuss  the  transitivity,  we  use  9  and  Cm  for  the  state  and  the  code  at  the  middle  level. 

Below  we  use  _  — >  *  _  for  zero  or  multiple-step  transitions  with  no  events  generated,  _  — >  _  for 
multiple-step  transitions  without  events,  _  — >  _  for  multiple-step  transitions  with  only  one  event  e 
generated,  and  _  — ■  for  an  infinite  execution  without  events. 
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(t,  a')  G  c  a 
(c,  a)  -4  (skip,  a') 

((7,0-)  — »*  (skip,(T') 

(((7),o-)  — i>  (skip,o-')  ((C),  a)  — >  abort  ((C),  a)  — )■  ((C),  a) 

(C,a)^(C',a') 


abort  G  c  a 
(c,  a)  — abort 

(C,  a)  — >  *  abort 


a  ^  dom(c) 
(c,a)  — )■  (c,a) 

(C,a)^- 


(C,a)^(C',a') 

(C;C'',a)^(C';C'',a') 


(C;C",a)^(C'-,C",a') 
(C,  a)  — >  abort 

(skip;  (7',  a)  — >•  ((7',  a)  (C\  C' ,  a)  — >•  abort 

[[BJs  =  true 

(while  (B)  C,(s,h))  — >  ((7;  while  (B)  C,(s,h)) 

|[-B|s  =  false  |[-B]s  undefined 


(while  (B)  C,(s,h))  — )•  (skip,  (s, /i)) 
[[BJs  =  true 


(while  (B)  (7,  (s,h))  — >■  abort 
|[B|s  =  false 


(if  (B)  (7i  else  C2,  (s,  h))  — >  ((7i,  (s,  h))  (if  (B)  Ci  else  C2,  (s,  h))  — >  (C2,  (s,  h)) 

|[i3|s  undefined 

(if  (B)  Cl  else  C2,  (s,  h))  — abort 


(Cl,  a)  ^  (Ci,a') 
(Ci\\C2,a)  ^  {Ci\\C2,a') 


(C2,a)  -4  (^,4) 

(Ci\\C2,a)  -4  (Cl  11^,4) 

(Cl,  a)  — abort  or  (C2,o-)  — ^  abort 


(skip  II  skip,  a)  — >  (skip,  a)  (Ci  ||  (72,  o')  — >■  abort 

Figure  2:  Operational  semantics. 


355 


1.2  Termination-Preserving  Event  Trace  Refinement 


{Evt  Trace)  £  ::= 


We  define  ETr{C,a,£)  in  Figure 


^1  i 


e  I  e::£  (co-inductive  interpretation) 


(C,a)  — ;>*  (skip.g-') 

ETriC,aA) 


(C,  a)  — >  ^  abort 

ETr{C,a,i) 


{C,a)-^+{C',a')  ETr{C\a',e) 
ETr{C,  a,  e) 


{C,a) -^+ {C',a')  ETr{C',a',£) 

ETr{C,  a,e:\£) 


Figure  3:  Co-inductive  definition  of  ETr{C,a,£). 


Definition  1  (Termination-Preserving  Refinement). 

(C,ct)E(C,S)  iff  y£.  ETr{C,a,£)  ETr{<C,T.,£). 


^We  made  a  typo  in  the  definition  of  ETr  in  our  published  paper.  In  the  paper,  the  third  rule  is  as  follows. 

ETr{C',(7',£) 


ETr{C,  cr,  £) 

Such  a  definition  is  incorrect  because  it  allows  any  event  trace  to  be  an  acceptable  trace  of  while  (true){skip}.  We 
corrected  it  by  restricting  the  trace  of  an  infinite  loop  to  be  empty,  as  shown  in  Figure 


356 


2  RGSim-T 


2.1  Assertion  Language 

We  first  define  the  assertions  used  in  our  simulation  RGSim-T  and  our  program  logic.  Their  syntax  is 
shown  in  Figure]^  and  their  semantics  is  shown  in  Figures and 

(RelAssn)  P,Q,I  ::=  B  \  ovjn{x)  \  emp  \  emp  \  E^E  \  E  E 

I  M  I  P*Q  I  P^Q  I  P^Q  I  P9Q  I  ••• 

(FullAssn)  p,q  ::=  P  \  arem(C)  |  wf{E)  \  [pja  |  [pj* 

I  p*q\p\/q\pAq\p(Biq\... 

(RelAct)  R,G  P  cc  Q  \  P  k  Q  \  [P]  \  R*  R  \  R+ 

I  RW  R  I  RaR  I  RlR  I  RlR  \  ... 

Figure  4:  Assertion  language. 

The  above  assertion  language  extends  the  one  in  our  CSL-LICS  paper  with  the  following  new  asser¬ 
tions. 

1.  p  ©  q,  which  is  like  a  conjunction  over  the  concrete  and  the  abstract  states  and  like  a  separating 
conjunction  over  the  number  of  tokens  and  the  abstract  code.  It  would  be  useful  to  simplify  the 
verification  of  some  specific  examples  (see  Section]^. 

2.  R  and  R^R,  which  are  compositions  of  two  relational  assertions  and  of  two  actions.  They 
are  used  in  the  transitivity  of  the  binary  judgments  (the  TRANS  rule  in  Figure]^.  We  use  9  and 
Cm  to  represent  the  middle-level  state  and  the  middle-level  code  respectively.  We  also  define  a 
predicate  MPrecise(P,  Q)  in  Figure]^  which  specifies  the  precise  property  about  the  middle-level 
states.  Here  P  and  Q  are  relational  assertions  between  low-level  and  middle-level  states  and  between 
middle-level  and  high-level  states  respectively. 

Note  that  our  logic  is  already  very  useful  without  the  above  extensions.  All  the  examples  that  we 
mentioned  in  our  CSL-LICST4  paper  can  be  verified  without  these  extensions. 
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/i_L/2  iff  {dom{fi)  n  dom{f2)  =  0) 

(si, /ii)_L(s2,  ^2)  iff  (S1-LS2)  A  (/ii_L/i2) 
(si  U  S2,  hi  U  /12) 


{si,hi)^{S2,h2)  = 


undefined 


if  (si,  /ii)_L(s2,  h2) 
otherwise 


{{s,h),{!e,h))  \=  B  iff  |[B|suja  =  true 

((s,  h),  (ffijli))  1=  own(x)  iff  dom(s  W  s)  =  {a;} 

{{s,h),  (ffi,]ti))  1=  emp  iff  {dom{s)  =  0)  A  {dom{h)  =  0) 

{{s,h),  (ffi,]ti))  1=  emp  iff  {dom{s)  =  0)  A  {dom{h)  =  0) 

((s,h),  (ffi,]ti))  1=  El  i-A  E2  iff  31,  n.  |[i5i]sttia  =  I  A  |[i?2lstbia  =  n  A  dom{h)  =  {1}  A  h{l)  =  n 

((s,  h),  (ffi,]ti))  1=  -El  l=^>  E2  iff  31,  n.  |[Ei|sttis  =  I  A  |[E2]stbis  =  n  A  dom{h.)  =  {1}  A  ]ti(Z)  =  n 

def 

emp  =  emp  A  emp 

(a,  E)  1=  P  5  Q  iff  30.  (a,  6»)  ^  P  A  (0,  E)  |=  Q 


((cr,  E),  (cr',  E'),  6)  1=  P  oc  Q  iff  (cr,  E)  |=  P  A  (a',  E')  |=  Q  A  (6  =  true) 

((<7,  E),  (a'.  S'),  b)hPxQ  iff  (a,  E)  ^  P  A  (a',  E')  |=  Q 

{{a,  E),  (a',  E'),  6)  |=  [P]  iff  (a,  E)  |=  P  A  (a  =  a')  A  (E  =  E') 

(((7,E),((7',E'),6)  NPi*P2  iff 

3(Ti,  Ei,a2,E2,cr),  E'l.cra.Ej.  ((ai,  Ei),  (cr),  E'i),fe)  ^  Pi  A  ((cr2,E2),  (cr),E)),6)  |=  P2 
A  (ct  =  (Ti  l±)  (72)  A  {a'  =  cr)  W  a))  A  (E  =  Ei  W  E2)  A  (E'  =  E)  ttJ  E)) 

((<7,E),(a',E'),6)  NP+  iff 
(((a,E),(a',E'),&)  N  P) 

V  {3a",  E",  b',  b".  {{{a,  E),  {a",  E"),  6')  |=  P)  A  {{{a",  E"),  (a',  E'),  b")  |=  P+)  A  (6  =  6'  V  b")) 

Id  =*  [true]  Emp  =*  emp  k  emp  True  =*  true  k  true 

(((7,E),((7',E'),6)  NP1IP2  iff 

30,  0',  fei,  62.  {{a,  0),  {o',  0'),  bi)  ^  Pi  A  {{0,  E),  {0' ,  E'),  62)  ^  ^2  A  (fe  =  61  A  62) 
(((7,E),((j',E'),6)  NPi?P2  iff 

30,  0',  61, 62.  ((a,  0),  {a',  O'),  bi)  ^  Pi  A  ((0,  E),  (0' ,  E'),  62)  ^  ^2  A  (6  =  61  V  62) 

Sta(P,P)  iff  Vcr,E,a',E',fe.  ((a,E)  |=P)A(((a,E),(<7',E'),6)  |=P)  ^  ((a',E')NP) 
Precise(P)  iff  Vcri,  Ei,  cr2,  E2,  cr),  E),  cr),  E). 

((cri  1+)  (72  =  cr)  tt)  cr))  A  ((cri,  _)  |=  P)  A  ((cr),  _)  ^  P)  (cri  =  cr))) 

A  ((El  W  E2  =  E)  W  E))  A  ((_,  El)  |=  P)  A  ((_,  E))  |=  P)  ^  (Ei  =  E))) 

7  >  P  iff  ([/]  ^  P)  A  (P  ^  7  K  7)  A  Precise(7) 

MPrecise(P,  Q)  iff 

'i0i,0'i,02,0'2-  (eiWe2  =  0)W0))A((_,ei)|=P)A((0),_)|=Q)  {Oi=0'i) 


Figure  5:  Semantics  of  assertions  (part  I). 
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(HCState)  D  ::=  C  |  • 

(FullState)  S  ::=  D,S)  where  w  €  Nat 


(cr,w,D,E)  1=  P 
(cr,  ui,D,E)  1=  arem(C') 
((s,fe),w,D,E)  ^wf(£) 
(cr,w,D,E)  h  [p\  = 

(cr, w,D,E)  h  LpJw 
(cr,  w,D,  E)  \=p®  q 


iff  (a,E)  |=P 
iff  D  =  C' 

iff  3n.  (I-Bls  =  n)  A  {n  <  w) 
iff  3D'.  (a,w,D',E)  |=  p 
iff  3w' .  (cr,  ty',  D,  E)  |=  p 

iff  3uii,  W2,  Di,  D2.  (cr,  wi,  Di,  E)  |=  p  A  (cr,  W2,  D2,  E)  |=  q 
A  {w  =  wi  +  W2)  A  (D  =  Di  1+)  D2) 


iff  3ui,D.  (cr,  t(;,D,  E)  |=  p 


Di_LD2  iff  (Di  =  •)  V  (D2  =  •) 

r  D2  if  Di  =  • 

Di  ttJ  D2  =  \  Di  if  D2  =  • 

undefined  otherwise 
(cri,  Wl,Dl,El)  W  ((72,  W2,D2,E2) 

def  (  (cri  i±)  (72,  wi  +  W2,  Di  thJ  D2,  El  i±)  El)  if  (7i_L(72,  Di_LD2  and  Ei_LE2 
(  undefined  otherwise 

S  \=  p*  q  iff  35i,  52.  (5  =  5i  1+)  S2)  A  (5i  |=  p)  A  (52  |=  q) 


Sta(p,  P)  iff 

V(7,  w,  D,  E,  (7',  E',  b.  ((cr,  w,  D,  E)  1=  p)  A  ((((7,  E),  {a',  S'),  b)  |=  R) 
3i!;'.  (cr',  w' ,  D,  E')  1=  p  A  (b  —  false  w'  =  w) 


Figure  6:  Semantics  of  assertions  (part  II). 
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2.2  Definition  of  RGSim-T 


Definition  2  (RGSim-T). 

R,GJ^{P}C<C{Q}  iff 

for  all  cr  and  S,  if  (cr,  E)  \=  P,  then  there  exists  M  such  that  R,G,I  \=  {G,  cr,  M)  (C,  E). 

Whenever  R,  G,  I  \=  (C,  cr,  M)  ^g  (C,  E),  then  (ct,  E)  |=  /  *  true  and  the  following  are  true: 

1.  for  any  ctf,  E^t-,  G'  and  a",  if  (C,  cr  l±)  ai?’)  — {G',a")  and  E-LEf,  then  there  exists  a'  such  that 
a”  =  a'  kt)  ap  and  one  of  the  following  holds: 

(a)  either,  there  exist  M' ,  C'  and  E'  such  that  (C,  E  l±)  E^?)  — (C',  E'  l±)  E^;’), 

{{a,  E),  {a',  E'),  true)  ^  G+  *  True  and  R,  G,  I  |=  (C",  a',  M')  <q  (C,  E'); 

(b)  or,  there  exists  M'  such  that  M'  <  M, 

((tr,  E),  (cr',  E),  false)  |=  G+  *  True  and  R,G,I  \=  {G' ,  a',  M')  ^g  (C,  E); 

2.  for  any  ap,  Yip,  e,  G'  and  cr",  if  {C,a^ap)  — ^  {G',a")  and  E_LEf,  then 

there  exist  a' ,  M' ,  C'  and  E'  such  that  a"  =  cr'  l±)  ap,  (C,  E  l±)  Yp)  — (C',  E'  l±)  Yp), 

((a,  Y),  (a',  Y'),  true)  |=  G+  *  True  and  R,G,H=  (G',a',  M')^q  (C',Y'); 

3.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  true)  \=  *  Id,  then 

there  exists  M'  such  that  R,G,I  \=  (G,  cr',  M')  Pq  (C,  E'); 

4.  for  any  a'  and  E',  if  ((cr,  E),  (cr',  E'),  false)  *  Id,  then 

i?,G,/h(Ga',M)^g(C,E'); 

5.  if  G  =  skip,  then  for  any  Yp,  if  E_LEi?,  one  of  the  following  holds: 

(a)  either,  there  exists  E'  such  that  (C,  E  l±)  E^)  — (skip,  E'  l±)  E^^^), 

((cr,  E),  (cr,  E'),  true)  ^  G+  *  True  and  (cr,  E')  ^  Q; 

(b)  or,  C  =  skip  and  (cr,  E)  \=  Q; 

6.  for  any  ap  and  E^,  if  (G,  a  l±)  ap)  — >  abort  and  E_LEi?,  then  (C,  E  l±)  Ep-)  — >■  abort. 

Inspired  by  Vafeiadis  m,  we  directly  embed  the  framing  aspect  of  separation  logic  in  Def.  At 
each  condition,  we  introduce  the  frame  states  ap  and  E^  at  the  target  and  source  levels  to  represent  the 
remaining  parts  of  the  states  owned  by  other  threads  in  the  system.  The  commands  G  and  C  must  not 
change  the  frame  states  during  their  executions. 

Technically,  we  introduce  theses  ap  and  Yp  quantifications  to  admit  the  frame  rules  (e.g.,  the  b-frame 
rule  in  Fig.  and  the  parallel  compositionality.  Suppose  we  remove  the  frame  states  in  Definition 
Then  consider  the  following  example.  We  can  prove 

Emp,  Emp,  emp  |=  {emp}  ([100]  :=  1)  H  ([100]  :=  2)  {emp}  (2.1) 


since  both  programs  would  abort  at  empty  states.  If  the  frame  rule  holds,  we  would  get  the  following  by 


framing  [100]  i-A  0  A  [100]  l=4>  0  to  ( 2.1 ): 


Emp,  Emp,  emp  |=  {[100]  ha  0  A  [100]  0}  ([100]  :=  1)  ^  ([100]  :=  2)  {[100]  ha  0  A  [100]  0} 

which  obviously  does  not  hold!  (In  our  previous  work  RGSim  [7],  the  frame  rule  we  provided  is  more  like 
an  invariance  rule  in  Hoare  logic.  We  do  not  have  a  real  frame  rule  due  to  the  above  reason.)  Similar  issue 
also  shows  up  in  admitting  the  parallel  compositionality  (the  b-par  rule  in  Fig.  [^.  The  thread  t  would 
abort  if  it  accesses  the  local  state  of  another  thread  t',  while  the  whole  program  may  not  abort  with  t 
and  t'  running  in  parallel.  So  we  can  construct  a  similar  counterexample  as  (|2.1|)  where  the  simulation 
holds  for  each  single  thread  but  fails  for  the  whole  program. 

Here  we  address  the  above  issue  by  embedding  the  framing  aspect  directly  in  the  simulation  definition, 
inspired  by  Vafeiadis  m-  For  the  simulation  in  Definition  with  the  ap  and  E^  quantifications,  the 
above  example  (2.1 1  is  no  longer  satisfied. 
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3  Logic 

Inference  rules  are  shown  in  Figures  and 


R,G,Ih-  {P}Ci^Ci{P'}  7?,  G,  7  h  {P'jGa  ^C2{g} 


(b-seq) 


77,G,7h  {P}Gi;G2^Ci;C2{g} 

P^(B<^B)*7  P,G,7I- {PAP}Gi^Ci{g}  P,G,7I- {PA^B}G2^C2{g} 

P,G,7  h  {P}if  (P)  Cl  else  G2^if  (B)  Ci  else  C2{g} 

P^(P<^B)*7  P,G,7I- {PaP}G^C{P} 


(b-if) 


P,G,7I- {Pjwhile  (P)  G  A  while  (B)  C{PA^P} 


(b- while) 


R  V  G2,  Gi,  7  h  {Pi  *  P}Gi  ^Ci{gi  *  g'l}  P  V  Gi,  G2, 7  h  (Pa  *  PjGa  ^C2{g2  *  ga} 

P  V  Qi  V  Q2  =>  I  I  \>  R 

P,  Gi  V  Ga,  7  h  {Pi  *  P2  *  P}Gi  II  Ga  ^Ci  ||  Cajgi  *  Q2  *  (gi  A  g^)} 

P  ^  (P  =  E) 


(b-skip) 


Emp,  Emp,  emp  h  {P}skipAskip{P}  Emp,  Emp,  emp  h  {P}print(P)  A  print(E){P} 

P,G,7I- {P}G^C{g}  G+^G  Sta(P',(P')+*ld)  7'>{P',G'}  P' ^  7' *  true 


P*P',G*G',7*7'  h  {P*P'}G^C{g*P'} 

Pl,Gl,7i  h  {Pl}GACM{gi}  P2,G2,72  h  {P2}CM^C{g2} 
MPrecise(7i,72)  ((Gi)+ “  (G2)+)  ^  (GilGa)^  (PiiPa)^  ^  ((Pi)+ ?  (P2)+) 
(Pi  ?  Pa),  (Gi  5  Ga),  (7i  ?  h)  h  {Pi  ?  P2}G^C{gi  ?  ga} 

P,  G,  7  h  {P  A  arem(C)}G{g  A  arem(skip)} 


(b-par) 

(b-prt) 

(b-frame) 


(trans) 


P,G,7I-  {P}G^C{g} 


(u2b) 


Figure  7:  Selected  binary  inference  rules. 


Definition  3  (Abstract  Step  “Implication”). 

G  , 

p  q  m, 

for  any  cr,  w,  D,  S  and  if  (cr,  w,  D,  E)  \=  p  and  E_LEp’,  then 
there  exist  ui',  C'  and  E'  such  that  (D,  E  l±)  E^)  — >■+  (C',  E'  l±)  E^), 

((cr,  E),  (cr,  E'),  true)  \=  G+  *  True  and  (cr,  w',C',  E')  \=  q. 

We  also  define  the  following  syntactic  sugars: 

.  Em  P  (*)  •  \  0  •  CT* 

p  ^  g  iff  p  =>  q  p  g  iff  p  =>  g  p  ^  g  iff  p  ^  q 

p  =§>*  g  iff  p  =§>"*"  q  V  p  g  p  g  iff  p  q  V  p  g 

Note  that  here  we  introduce  the  Ep  quantification  similar  to  Definition  for  RGSim-T.  In  our  CSL- 
LICS’14  paper,  we  simplified  the  above  definition  and  only  defined  p  q  to  save  space.  The  more 

general  case  p  =>+  q  defined  here  is  useful  in  the  A-CONSEQ  rule,  which  is  omitted  in  our  CSL-LICS’14 
paper. 

We  prove  a  few  properties  of  p  g,  as  shown  in  Figure]^  For  instance,  the  first  rule  says,  we  can 
derive  (PAarem(C))  (Q  A arem(skip)  A  wf(if))  by  executing  the  source  code  C.  And  since  the  source 
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Emp,  Emp,  emp  h  {p}skip{p} 


(skip) 


Esl  [p]c[g]  c  is  silent 

-  (env) 

Emp,  Emp,  emp  h  {p}c{q} 


l-sL  [p]C[q] 


(HpU  li'liJ)  =>  G  *  True  I  >  G  pV  q  ^  I  *  true 
II],G,I^  {p}{C){q} 


(atom) 


P  p'  1“ sL  [p^]G[(ir']  q'  q  +  €  {a,  b} 

([[pjj  oc  [[g_[j)  ^  G  *  True  1 1>  G  pV  q  ^  I  *  true 


[71,G,/h{p}(C)M 


(ATOM+) 


[71,G,/h{p}(C){g}  Su{{p,q},R*\d) 

R,G,Ih{p}{G){q} 


I^R 

-  (atom-r) 


7?,G,7h{p}C7i{p'}  7?,G,7h{p'}G2W 

77,G,7h{p}Gi;G2{g} 


(seq) 


p  ^  {B  —  B)  *  I  p  A  B  ^  p'  *  (wf(l)  A  emp)  7?,  G,  7  h  {p'}G{p} 
7?,  G,  7  h  {p} while  (B)  C{p  A  iB} 


(while) 


R,  G,7h{p}G{g} 
7?,G,7h{LpJw}G{LgJw} 


(hide-w) 


7?,G,7  I- {p}G{<j}  Sta(p',(7?')^*ld)  7'|>{7?',G'}  p' ^  7' *  true  G+ 

7?  *  7?',  G  *  G',  7  *  7'  h  {p  *  p'}G{«7  *  p'} 

7?,  G,  7  h  {p}C{q}  Sta(p',  {7?+  *  Id,  G  *  True}) 

(fr-conj) 
(arem) 


(frame) 


7?,G,7h  {p®p'}G{(7®p'} 

77,G,7  h  {[pja  A  arem(Ci)}G{[gJa  A  arem(C2)} 


77,  G,  7  h  { [pj  a  A  arem  (Cl ;  Cs ) }  G{  L<jJ  a  A  a  rem  (C2 ;  Ca ) } 
77,G,7h  {pi}G{gi}  77,  G,  7  h  {p2}G{g2} 


77,  G,  7  h  {pi  V  P2}G{(Ji  V  52} 


(disj) 


p  =^*  p' 


77,  G,  7  h  {p'}G{g'}  g' =§>*  g  Sta({p,  g},  77  *  Id)  p  V  g  ^  7  *  true 


77,G,7h{p}G{g} 


(a-conseq) 


Figure  8:  Selected  unary  inference  rules. 
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code  makes  multiple  steps,  we  are  allowed  to  increase  the  number  of  tokens  (wf(if)).  We  can  also  execute 
the  source  code  in  trivial  cases,  for  example,  when  the  source  code  is  skip;C,  or  it  is  a  while  loop  but  we 
know  for  sure  the  value  of  the  loop  condition.  In  those  cases,  the  step  of  the  source  code  is  an  identity 
transition.  Moreover,  p  q  is  transitive  and  we  can  also  have  “frame  rule”  (i.e.,  local  reasoning)  over 
it. 


C  /  skip  hsL  [P]C[Q] 

{P  A  arem(C))  {Q  A  arem(skip)  A  vjf{E)) 

P  ^  I  *  true 

(P  A  arem(skip;  C))  (P  A  arem(C)  A  wf(P)) 


_P  =>  B  *  7 

(P  A  arem(if  (B)  Ci  else  C2))  (P  A  arem(Ci)  A  wf(P)) 
P  =>  (“'B)  *  I 

(P  A  arem(if  (B)  Ci  else  C2))  (P  A  arem(C2)  A  wf(P)) 


)  *  I 


(P  A  arem(while  (B)  C))  (P  A  arem(C;  while  (B)  C)  A  wf(P)) 
P  ^  (“'®)  *  I 

(PAarem(while  (B)  C))  (PA  arem(skip)  A  wf(P)) 

(P  A  arem(Ci))  (Q  A  arem(C2)  A  wf(P)) 

(P  A  arem(Ci;  C3))  (Q  A  arem(C2;  C3)  A  wf(P)) 


p  =§-"'■  p' 


p' q  /oG 


P=>P 


/  G[  -I-  / 

p  =»+  q 


G  + 

p  q 


G  + 

P  =>  9 


G'  ^  G 


G  +  G  + 

Pi  =>  9l  P2  =>  92 

(pi  V  P2)  =§>+  (qi  V  (72) 


G  + 

p  q 


(P*P')  (9*P') 


Figure  9:  Properties  of  p  g. 

Below  we  discuss  some  interesting  rules  which  are  not  shown  in  our  CSL-LICS’14  paper  due  to  the 
space  limit.  The  binary  rules  are  very  similar  to  those  in  our  previous  work  RGSim  [7].  The  TRANS  rule 
shows  the  transitivity  of  our  RGSim-T  relation. 

For  the  unary  rules  in  Figure  in  addition  to  rules  for  atomic  blocks,  we  have  skip  and  ENV  rules 
to  reason  about  skip  and  primitive  instructions.  Here  we  assume  the  unary  logic  handles  only  programs 
which  do  not  produce  external  events  (e.g.,  the  ENV  rule  has  a  side  condition  saying  that  “c  is  silent”). 
For  commands  producing  events,  such  as  the  print  command,  we  require  lockstep  at  the  target  and  source 
levels  and  prove  such  refinement  using  the  binary  inference  rules  (e.g.,  the  B-prt  rule  in  Figure]^.  It  is 
also  possible  to  extend  the  current  unary  logic  with  assertions  for  event  traces  and  provide  unary  rules  to 
reason  about  commands  with  events.  Note  that  although  the  shared  resource  is  empty  in  the  skip  and 
ENV  rules,  we  can  derive  rules  allowing  resource  sharing  from  them  and  the  frame  rule  in  Figure 
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In  addition  to  the  rules  for  while  loops  as  in  the  CSL-LICS’14  paper,  we  also  have  unary  rules  for 
sequential  composition  (the  SEQ  rule  in  Figure]^  and  for  if-then-else  composition  (omitted  here),  both 
of  which  are  in  the  same  forms  as  in  LRG  [2].  The  unary  frame  rule  is  similar  to  the  binary  one  in 
Figure]^  It  is  also  in  the  same  form  as  in  LRG  [2]. 

The  FR-CONJ  rule  is  like  the  frame  rule  in  RGSep  [T2].  The  frame  p'  may  specify  the  number  of  tokens 
used  by  the  context  of  the  code  C,  i.e.,  the  code  C  does  not  consume  these  tokens  in  p' .  The  frame  p' 
may  also  specify  the  shared  concrete  and  abstract  states  (and  the  case  usually  occurs  when  the  number 
of  tokens  depends  on  the  concrete  and  abstract  states) .  So  we  use  the  new  operator  ®  to  ensure  that  the 
concrete  and  abstract  states  specified  in  p  and  p'  coincide. 

The  AREM  rule  is  like  a  frame  rule  over  source  code.  It  allows  us  to  reason  about  refinement  using 
“local”  source  code,  i.e.,  source  code  which  is  really  refined  by  the  target. 

The  A-CONSEQ  rule  allows  us  to  execute  the  source  code  outside  of  an  atomic  block.  It  requires  that 
the  transitions  of  the  source  code  over  the  shared  states  satisfy  G+ ,  but  it  is  usually  used  when  the  steps 
are  simply  identity  transitions.  For  instance,  we  can  use  the  rule  to  unfold  a  while  loop  at  the  source 
at  any  time  in  a  rehnement  proof  (we  do  not  have  to  be  in  an  atomic  block  of  the  target  code).  When 
G  G 

p  p'  and  q'  =>*  q  are  p  ^  p'  and  q'  ^  q  respectively,  this  rule  becomes  the  normal  CONSEQ  rule  (see 
RGSep  [la  and  LRG  0). 

We  can  also  derive  the  following  while-term  rule  from  the  while  rule.  The  derivation  is  shown  in 
Section  [5l 


R,G,  I  {p  A  B  A  {E  =  a)}C{p  A  {E  <q)}  p  A  B  ^  E  >  0 
p^  {{B  =  B)  A{E  =  E))*I  G+^G  a  is  a  fresh  logical  variable 

R,G,/h  {LpJ„} while  (B)  G{LpJwA->B} 


(while-term) 


The  WHILE-TERM  rule  is  similar  to  a  total  correctness  while  rule  (e.g.,  see  [IQ])-  In  every  round  of 
the  loop,  the  loop  variant  E  decreases  (but  should  always  be  positive).  We  can  verify  refinement  for 
such  a  locally-terminating  loop  (a  loop  that  always  terminates  regardless  of  environment  steps)  without 
specifying  tokens.  To  derive  this  rule,  we  actually  need  to  introduce  the  number  of  tokens  as  an  auxiliary 
state  for  the  loop  iterations  and  relate  it  to  the  loop  variant  E  in  the  real  state. 

Soundness  of  the  logic  is  proved  in  Section]^  (where  we  also  define  the  unary  judgment  semantics). 
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4  Examples 


In  this  section,  we  verify  the  examples  claimed  in  our  CSL-LICS’14  paper  (see  Figure  10).  To  simplify 
the  presentation  of  the  proofs,  assume  we  always  have  the  ownerships  of  program  variables. 


Linearizability  &  Lock-Freedom 

Counter  and  its  variants 

Treiber  stack 

Michael-Scott  lock-free  queue  [8j 

DGLM  lock- free  queue  [T] 

Non-Atomic  Object  Correctness 

Synchronous  queue  [9] 

Correctness  of  Optimized  Algo 
(Equivalence) 

Counter  vs.  its  variants 

TAS  lock  vs.  TTAS  lock  [3] 

Figure  10:  Verified  examples  using  our  logic. 


4.1  Counter  and  Its  Variants 


In  Figure  11  we  show  four  possible  implementations  of  the  counter.  Though  they  are  quite  simple,  they 
illustrate  different  choices  that  programmers  may  make  to  implement  a  concurrent  object.  The  abstract 
atomic  INC  operation  is  shown  below: 


INCO  {  X  :=  X  +  1;  } 


1  incO  { 

2  local  t,  b; 

3  b  :=  false; 

4  while  (!b)  { 

5  <  t  :=  x;  > 

6  b  :=  cas(&x,  t,  t+1) ; 

7  } 

8  } 


1  inc’ 0  { 

2  local  t,  b; 

3  b  :=  false; 

4  <  t  :=  x;  > 

5  while  (!b)  { 

6  b  :=  cas(&x,  t,  t+1); 

7  <  t  :=  x;  > 

8  } 

9  } 


1  incOptO  f 

2  local  t,  b,  b’ ; 

3  b  :=  false; 

4  while  (!b)  { 

5  b’  :=  false; 

6  while  (!b’)  { 

7  <  t  :=  x;  > 

8  <  b’  :=  (t  =  x)  ;  > 

9  } 

10  b  :=  cas(&x,  t,  t+1); 

11  I 

12  } 


1  incOpt  ’  0  { 

2  local  t ,  b,  b’ ; 

3  b  :=  false; 

4  while  (!b)  { 

5  <  t  :=  x;  > 

6  <  b’  :=  (t  =  x)  ;  > 

7  while  (!b’)  { 

8  <  t  :=  x;  > 

9  <  b’  :=  (t  =  x)  ;  > 

10  > 

11  b  :=  cas(&x,  t,  t+1); 

12  } 

13  y 


Figure  11:  Various  implementations  of  counter. 

Below  we  first  verify  that  each  implementation  C  of  the  counter  is  correct  w.r.t.  to  INC.  Here  cor¬ 
rectness  refer  to  linearizability  and  lock-freedom  together.  As  explained  in  the  submitted  paper,  we  only 
need  to  prove  the  following  in  our  logic: 

A,G,/h  {/}  C  ^  INC  {/} 

where  R  and  G  specify  the  possible  actions  (i.e.,  increments)  on  the  well-formed  shared  data  structure 
(i.e.,  counter)  fenced  by  /.  In  all  these  examples,  they  share  the  same  R,  G  and  I  as  follows: 
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/  (x  =  X)  R  =  G  =  (/cc/)V[/] 

By  the  u2b  rule,  the  above  is  reduced  to  proving  the  following  unary  judgment: 

R,G,I  \-  {I  /\  arem(X  :=  X  +  \)}C{I  A  arem(skip)} 


The  proofs  are  shown  in  Figures  [121  [T3|  [14|  and  [T5| 

We  can  also  prove  the  equivalence  between  incOpt  and  inc.  That  is,  we  prove: 

i?,  G, /h{/}  incDpt  A  inc  {/}  and  i?,G, /!-{/}  inc  A  incOpt  {/} 

Here  we  use  the  same  i?,  G  and  /  as  above  (always  use  x  at  the  left  side  and  X  at  the  right  side).  The 
proofs  are  shown  in  Figures  17  and  18  The  equivalence  between  incOpt’  and  inc  is  similar. 

1  incO  { 

2  local  t,  b; 

{/A  arem(X  :=  X  +  1)} 

3  b  :=  false; 

{  (^b  A  7  A  arem(X  :=  X  +  1))  V  (b  A  7  A  arem(skip)) } 

4  while  (!b)  •[ 

{^b  A  7  A  arem(X  :=  X  +  1)  A  wf(0) } 

{x  =  X}  *  (emp  A  -lb  A  arem(X  :=  X  +  1)  A  wf(0)) 

5  <  t  :=  x;  > 

{(x  =  X  =  t)v((x  =  X/  t)A  wf  (1)) }  *  (emp  A  -ib  A  arem(X  :=  X  +  1)  A  wf  (0)) 

(^b  A  (x  =  X  =  t)  A  arem(X  :=  X  +  1)  A  wf(0))  1 

V  (^b  A  (x  =  X  /  t)  A  arem(X  :=  X  +  1)  A  wf(l))  J 

6  b  :=  cas(&x,  t,  t+1); 

{ (b  A  7  A  arem(skip)  A  wf(l))  V  (-'b  A  7  A  arem(X  :=  X  +  1)  A  wf(l)) } 

7  } 

{7  A  arem(skip) } 

8  } 


//Applying  the  while  rule  and  the  hide-w  rule 
//Applying  the  frame  rule 


Figure  12:  Proving  inc  refines  INC. 


1  inc ’  ()  •[ 

2  local  t,  b; 

{7  A  arem(X  :=  X  +  1) } 

3  b  :=  false; 

{-lb  A  7  A  arem(X  :=  X  +  1) } 

4  <  t  :=  x;  > 

(  (^b  A  (x  =  X  =  t)  A  arem(X  :=  X  +  1))  I 

<  V  (^b  A  (x  =  X  7^  t)  A  arem(X  ;=  X  +  1))  >  / /Applying  the  while  rule  and  the  hide-w  rule 

(  V  (b  A  7  A  arem(skip))  J 

5  while  (!b)  ■[ 

{ (^b  A  (x  =  X  =  t)  A  arem(X  :=  X  +  1)  A  wf(0))  V  (^b  A  (x  =  X  /  t)  A  arem(X  :=  X  -f  1)  A  wf(l)) } 

6  b  :=  cas(&x,  t,  t+1); 

{ (b  A  7  A  arem(skip)  A  wf(l))  V  (^b  A  7  A  arem(X  :=  X  -|-  1)  A  wf  (1)) } 

7  <  t  :=  x;  > 

(  (^b  A  (x  =  X  =  t)  A  arem(X  :=  X  -f  1)  A  wf(l))  I 
<  V  (-lb  A  (x  =  X  /  t)  A  arem(X  :=  X  -|-  1)  A  wf(2))  > 

(  V  (b  A  7  A  arem(skip)  A  wf(l))  J 

8  } 

{7  A  arem(skip) } 

9  } 


Figure  13:  Proving  inc’  refines  INC. 
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//Applying  the  while  rule  and  the  hide-w  rule 


1  incDptO  ■[ 

2  local  t,  b,  b’ ; 

{/A  arem(X  :=  X  +  1) } 

3  b  :=  false; 

{ (^b  A  7  A  arem(X  :=  X  +  1))  V  (b  A  7  A  arem(skip)) } 

4  while  (!b)  ■[ 

{^b  A  7  A  arem(X  ;=  X  +  1)  A  wf(l)} 

{ (x  =  X)  A  wf(l) }  *  (emp  A -lb  A  arem(X  :=  X  +  1))  //Applying  the  frame  rule 

5  b’  :=  false; 

{(-lb’  A  (x  =  X)  A  wf(l))  V  (b’  A  (x  =  X  =  t))  V  (b’  A  (x  =  X  7^  t)  A  wf(2))}  //Applying  the  while  rule 

6  while  (!b’)  ■[ 

{  (x  =  X)  A  wf(0)  } 

7  <  t  :=  x;  > 

{ (x  =  X  =  t)  V  ((x  =  X  /  t)  A  wf(l)) } 

8  <  b’  :=  (t  =  x) ;  > 

{ (b’  A  (x  =  X  =  t))  V  (b’  A  (x  =  X  /  t)  A  wf(2))  V  (^b’  A  (x  =  X  /  t)  A  wf(l)) } 

9  > 

{ (x  =  X  =  t)  V  ((x  =  X  77  t)  A  wf(2)) }  *  (emp  A  -ib  A  arem(X  :=  X  +  1)) 

J  (^b  A  (x  =  X  =  t)  A  arem(X  :=  X  +  1)  A  wf(0))  1 

{  V  (^b  A  (x  =  X  77  t)  A  arem(X  ;=  X  +  1)  A  wf(2))  J 

10  b  :=  cas(&x,  t,  t+1); 

{ (b  A  7  A  arem(skip)  A  wf(l))  V  (-ib  A  7  A  arem(X  :=  X  +  1)  A  wf(2)) } 

11  > 

{7  A  arem(skip) } 

12  } 


Figure  14:  Proving  incOpt  refines  INC. 


1  incDpt ’ 0  { 

2  local  t,  b,  b’ ; 

{  7  A  arem(X  :=  X  +  1) } 

3  b  :=  false; 

{ (^b  A  7  A  arem(X  :=  X  +  1))  V  (b  A  7  A  arem(skip)) }  //Applying  the  while  rule  and  the  hide-w  rule 

4  while  (!b)  ■[ 

{^b  A  7  A  arem(X  ;=  X  +  1)  A  wf(0) } 

{x  =  X}  *  (emp  A  -lb  A  arem(X  :=  X  -|-  1)  A  wf(0))  //Applying  the  frame  rule 

5  <  t  ;=  x;  > 

{ (x  =  X  =  t)  V  ((x  =  X  77  t)  A  wf(l)) } 

6  <  b’  :=  (t  =  x) ;  > 

{(b’  A  (x  =  X  =  t))  V  ((x  =  X  77  t)  A  wf(l))}  //Applying  the  while  rule 

7  while  (!b’)  ■[ 

{  (x  =  X)  A  wf(0)  } 

8  <  t  :=  x;  > 

{(x  =  X  =  t)V((x  =  X77t)A  wf(l)) } 

9  <  b’  :=  (t  =  x) ;  > 

{ (b’  A  (x  =  X  =  t))  V  ((x  =  X  77  t)  A  wf(l)) } 

10  } 

{(x  =  X  =  t)V((x  =  X77t)A  wf  (1)) }  *  (emp  A  -ib  A  arem(X  :=  X  -|-  1)  A  wf  (0)) 

J  (^b  A  (x  =  X  =  t)  A  arem(X  :=  X  -|-  1)  A  wf(0))  1 

{  V  (^b  A  (x  =  X  77  t)  A  arem(X  :=  X  -|-  1)  A  wf(l))  J 

11  b  :=  cas(&x,  t,  t+1); 

{ (b  A  7  A  arem(skip)  A  wf(l))  V  (-ib  A  7  A  arem(X  :=  X  -|-  1)  A  wf(l)) } 

12  y 

{7  A  arem(skip) } 

13  } 


Figure  15:  Proving  incQpt  ’  refines  INC. 
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I  =  (x  =  X) 

R  =  G  '^=  (3n.  (x  =  X  =  n)  a  (x  =  X  >  n))  V  [7] 


1 

2 

3 

4 


5 

6 


7 

8 
9 


10 


11 

12 

13 


incDpt ’ 0  { 


T(skip))}  //Applying  the  while  rule  and  the  hide-w  rule 


local  t,  b; 

{/A  arem(X  :=  X  +  1) } 
b  :=  false; 

{(^bA7Aarem(X:=X  +  l))  V  (b  A  7  A  ; 
while  (!b)  •[ 

{^b  A  7  A  arem(X  ;=  X  +  1)  A  wf(0) } 

{x  =  Xj-  *  (emp  A  *^b  A  arem(X  :=  X  +  1)  A  wf(0))  //Applying  the  frame  rule 

<  t  :=  x;  > 

{ (x  =  X  =  t  =  q;)  V  ((x  =  X  >  a)  A  (t  =  a)  A  wf(l)) } 

<  b’  :=  (t  =  x) ;  > 

{(b’  A  (x  =  X  =  t  =  a))  V  ((x  =  X  >  a)  A  (t  =  q)  A  wf(l))} 

{(b’  A(x  =  X  =  t  =  a))  V  (x  =  X  >  a)}  ®  ((x  =  X  =  a)  V  (x  =  X  >  a)  A  wf(l)) 

/ /Applying  the  fr-conj  rule  / /Applying  the  while  rule  and  the  hide-w  rule 


while  (!b’)  ■[ 

{ (x  =  X  >  q)  A  wf  (0) } 

<  t  :=  x;  > 

{(x  =  X  =  t>a)V((x  =  X>t>a)A  wf  (1)) } 

<  b’  :=  (t  =  x) ;  > 

{(b’A(x  =  X  =  t>  a))  V  ((x  =  X  >  t  >  a)  A  wf(l)) } 
{(b’A(x  =  X  =  t>  a))  V  ((x  =  X  >  a)  A  wf(l)) } 

> 


{(x  =  X  =  t  =  a)v(x  =  X>a)}  ®  ((x  =  X  =  a)  V  (x  =  X  >  a)  A  wf(l)) 

{(x  =  X  =  t  =  a)v((x  =  X>a)A  wf  (1)) } 

{(x  =  X  =  t)v((x  =  X7ft)A  wf  (1)) }  *  (emp  A  “^b  A  arem(X  :=  X  -f  1)  A  wf  (0)) 

J  (^b  A  (x  =  X  =  t)  A  arem(X  :=  X  H-  1)  A  wf(0))  1 

(  V  (^b  A  (x  =  X  yf  t)  A  arem(X  :=  X  -f  1)  A  wf(l))  J 
b  :=  cas(&x,  t,  t+1) ; 

{ (b  A  7  A  arem(skip)  A  wf(l))  V  (^b  A  7  A  arem(X  :=  X -f  1)  A  wf  (1)) } 

> 


{7  A  arem(skip) } 

} 


Figure  16:  Proving  incOpt’  refines  INC  (an  alternative  approach  by  using  the  fr-CONJ  rule),  a  is  a 
logical  variable. 
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inc  (B  :=  false;  incLoop;) 
incLoop  "=  (while  (!B)  {  <T:=X>;  incCas;  }■) 
incCas  (B  :=  cas(&X,  T,  T+1);) 

1  incOptO  {. 

2  local  t,  b,  b’ ; 

{/A  arem(inc) } 

3  b  :=  false; 

{  (^b  A  ^B  A  7  A  arem(incLoop))  V  (b  A  B  A  7  A  arem(skip)) } 

/ /Applying  the  while  rule  and  the  hide-w  rule 

4  while  (!b)  •[ 

{^b  A  ^B  A  7  A  arem(incLoop)  A  wf(0) } 

5  b’  :=  false; 

(-■b’  A  ^b  A  ^B  A  (x  =  X)  A  arem(incLoop)  A  wf(0)) 

V  (b’  A  ^b  A  ^B  A  (x  =  X)  A  (t  =  T)  A  arem(  incCas ;  incLoop)  A  wf(0)) 

/ /Applying  the  while  rule  and  the  hide-w  rule 

6  while  (!b’)  ■[ 

{  ^b  ’  A  ^b  A  ^B  A  (x  =  X)  A  arem (incLoop)  A  wf  (0) } 

I  ^b  A  ^B  A  (x  =  X)  A  arem(<T :  =X>;  incCas;  incLoop)  A  wf  (1)  } 

7  <  t  :=  x;  > 

{  ^b  A  ^B  A  (x  =  X)  A  (t  =  T)  A  arem(incCas;  incLoop)  A  wf  (1) } 

8  <  b’  :=  (t  =  x) ;  > 

(^b’  A  ^b  A  ^B  A  (x  =  X)  A  arem(incLoop)  A  wf(l)) 

V  (b’  A  ^b  A  ^B  A  (x  =  X)  A  (t  =  T)  A  arem(incCas ;  incLoop)  A  wf(l)) 

9  > 

{b’  A  (x  =  X)  A  (t  =  T)  A  arem(incCas ;  incLoop)  A  wf(0)  } 

10  b  :=  cas(&x,  t,  t+1); 

(b  =  B)  A  7  A  arem(incLoop)  A  wf(l)} 

(b  A  B  A  7  A  arem(skip))  V  (^b  A  ^B  A  7  A  arem(incLoop)  A  wf(l)) } 

11  } 

{7  A  arem  (skip) } 

12  } 


Figure  17:  Proving  incOpt  refines  inc. 
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dof  /  \ 

incOpt  =  (B  :=  false;  incOptLoop;) 

incOptLoop  =*  (while(!B)  {  incOptInner;  incCas ;  ]■) 

incOptInner  (B’:=false;  while(!B’)  •[  <T:=X>;  <B’:  =  (T=X)>;  }■) 

incCas  (B  :=  cas(&X,  T,  T+1);) 

1  incO  { 

2  local  t,  b; 

{/A  arem(incOpt) } 

3  b  :=  false; 

{  (^b  A  ^B  A  7  A  arem(incOptLoop))  V  (b  A  B  A  7  A  arem(skip)) } 

/ /Applying  the  while  rule  and  the  hide-w  rule 

4  while  (!b)  •[ 

{^b  A  ^B  A  7  A  arem(incOptLoop)  A  wf(0) } 

5  <  t  :=  x;  > 

{->b  A  ^B  A  (x  =  X)  A  (t  =  T)  A  arem(incCas;  incOptLoop)  A  wf(l) } 

6  b  :=  cas(&x,  t,  t+1); 

{ (b  =  B)  A  7  A  arem (incOptLoop)  A  wf(l) } 

T  } 

{7  A  arem  (skip) } 

8  } 


Figure  18:  Proving  inc  refines  incDpt. 
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4.2  TAS  Lock  and  TTAS  Lock 


1  lockO  { 

2  local  b,  b’ ; 

3  b  : =  true ; 

4  while  (b)  { 

5  <  b’  :=  1;  > 

6  while  (b’)  { 

7  <  b’  :=  1;  > 

8  > 

9  b  :=  get AndSet (&1 ,  true); 

10  } 

11  > 

1  unlock 0  { 

2  <  1  :=  false;  > 

3  > 


1  LOCKO  { 

2  local  B; 

3  B  :=  get AndSet (&L,  true); 

4  while  (B)  { 

5  B  :=  getAndSet(&L,  true); 

6  } 

7  } 

1  UNLOCK 0  { 

2  <  L  :=  false;  > 

3  > 


Figure  19:  TTASLock  (the  left)  and  TASLock  (the  right). 


In  Figure  19  we  show  the  implementations  of  TTAS  lock  and  TAS  lock  [3]. 
equivalence  between  these  two  implementations.  That  is,  we  prove: 


We  can  prove  the 


h  {/}  lock  ^  LOCK  {/}  and  i?,  G,  /  h  {/}  LOCK  ^  lock  {/} 

i?,  G,  /  h  {/}  unlock  ^  UNLOCK  {/}  and  i?,  G,  /  h  {/}  UNLOCK  ^  unlock  {/} 


As  in  the  example  of  counters,  R  and  G  specify  the  possible  actions  on  the  well-formed  shared  data 
structure  fenced  by  I.  Here  i?,  G  and  /  can  be  defined  as  follows: 

/  =  (1  =  L)  R  =  G  =  (/oc/)V[/] 

The  proofs  for  the  refinements  between  unlock  and  UNLOCK  are  straightforward  since  their  code  is  the 
same.  We  show  the  proofs  for  the  refinements  between  lock  and  LOCK  in  Figures  and 
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GAS  =  (B  :=  getAndSet (&L,  true)) 

LoopGAS  (while (B)  GAS;) 

1  lockO  { 

2  local  b,  b’ ; 

{  /  A  arem(LOCK) } 

3  b  : =  true ; 

{ (b  A  7  A  arem(GAS;  LoopGAS))  V  (^b  A  7  A  arem(skip)) }  //Applying  the  while  rule  and  the  hide-w  rule 

4  while  (b)  { 

{b  A  7  A  arem(GAS;  LoopGAS)  A  wf(0) } 

5  <  b’  :=  1;  > 

{(bAb’  ABA7A  arem(LoopGAS)  A  wf(l))  V  (b  A  ^b’  A  7  A  arem(GAS;  LoopGAS)  A  wf(0)) } 

{b  A  7  A  arem(GAS;  LoopGAS) }  //Applying  the  while  rule  and  the  hide-w  rule 

6  while  (b’)  {. 

{b  A  7  A  arem(GAS;  LoopGAS)  A  wf(0) } 

7  <  b’  :=  1;  > 

{(bAb’  ABA7A  arem  (LoopGAS)  Awf(l))  V  (bA“^b’A7A  arem(GAS;  LoopGAS)  A  wf  (0)) } 

{  (b  A  b’  A  7  A  arem(GAS;  LoopGAS)  A  wf(l))  V  (b  A  ^b’  A  7  A  arem(GAS;  LoopGAS)  A  wf(0))  } 

8  y 

{b  A  7  A  arem  (GAS;  LoopGAS)  A  wf(0) } 

9  b  :=  getAndSet (&1 ,  true); 

{ (b  =  B)  A  7  A  arem(LoopGAS)  A  wf(l) } 

I  (^b  A  7  A  arem(skip)  A  wf  (1))  V  (b  A  7  A  arem(GAS;  LoopGAS)  A  wf  (1)) } 

10  > 

{7  A  arem  (skip) } 

11  } 


Figure  20:  Proving  TTASLock  refines  TASLock. 


loopTTAS  (while (b)  f...}) 


1  LOCKO  { 

2  local  B; 

{7  A  arem  (lock) } 

3  B  :=  getAndSet(&L,  true); 

{ (b  =  B)  A  7  A  arem  (loopTTAS)  A  wf(l) } 

{ (b  =  B)  A  7  A  arem  (loopTTAS) }  //Applying  the  while  rule  and  the  hide-w  rule 

4  while  (B)  { 

{b  A  B  A  7  A  arem  (loopTTAS)  A  wf(0) } 

5  B  :=  getAndSet (&L,  true); 

{ (b  =  B)  A  7  A  arem  (loopTTAS)  A  wf(l) } 

6  } 

{-lb  A  ^B  A  7  A  arem  (loopTTAS) } 

{7  A  arem  (skip) } 

7  } 


Figure  21:  Proving  TASLock  refines  TTASLock. 
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4.3  Treiber  Stack 


1  push(v)  •[ 

2  local  X,  t,  b; 

3  b  :=  false; 

4  X  :=  cons(v,  null); 

5  while  (!b)  { 

6  <  t  :=  S;  > 

7  x.next  :=  t; 

8  b  : =  cas (&S,  t ,  x) ; 

9  } 

10  > 


1  popO  { 

2  local  V,  X,  t,  b; 

3  b  :=  false; 

4  while  (!b)  {. 

5  <  t  :=  S;  > 

6  if  (t  =  null)  { 

7  V  :=  EMPTY; 

8  b  : =  true ; 

9  }  else  {. 

10  V  :=  t.data; 

11  X  :=  t.next; 

12  b  :=  cas(&S,  t,  x) ; 

13  } 

14  } 

15  return  v; 

16  } 


1  PUSH(V)  { 

2  <  Stk  :=  V  : :  Stk;  > 

3  y 

1  POPO  { 

2  local  V; 

3  <  if  (Stk  =  e)  { 

4  V  :=  EMPTY; 

5  }■  else  {. 

6  V  :=  head(Stk); 

7  Stk  :=  tail(Stk); 

8  > 

9  > 

10  return  V ; 

11  }■ 


Figure  22:  Treiber  stack. 


In  Figure  22  we  show  the  implementation  of  Treiber  stack  (at  the  left  of  the  figure),  and  the  ab¬ 
stract  atomic  operations  (at  the  right).  The  abstract  PUSH  and  POP  operations  manipulate  an  abstract 
mathematical  list  Stk,  and  when  popping  from  an  empty  stack,  POP  returns  EMPTY. 

Below  we  use  our  logic  to  prove  the  linearizability  and  lock-freedom  together  of  Treiber  stack.  As 
explained  in  the  submitted  paper,  we  only  need  to  prove  the  following  in  our  logic: 

A,  G,  /  h  {/  A  (v  =  V)}  push(v)  A  PUSH(V)  {/}  and  A,  G,  /  F  {/}  pop  A  POP  {/  A  (v  =  V)} 


By  the  u2b  rule,  the  above  is  reduced  to  proving  the  following  unary  judgment: 

R,G,  I  \-  {I  A  arem(PUSH(V) )  A  (v  =  V)}  push(v)  {/  A  arem{skip)} 
and  R,G,  I  \-  {I  A  arem(POP)}  pop  {/  A  arem(sfcip)  A  (v  =  V)} 

We  define  the  precise  invariant  I,  the  rely  R  and  the  guarantee  G  in  Figure  The  invariant  I 
in  Figure  maps  the  value  sequence  A  of  the  concrete  list  pointed  to  by  S  (denoted  by  {S  =  x)  * 
ls(x.  A, null))  to  the  abstract  stack  Stk.  To  ensure  there  is  no  “ABA”  problem  [3],  we  follow  Turon  and 
Wand  m  and  introduce  a  write-only  auxiliary  variable  GN  to  remember  the  nodes  which  used  to  be  on 
the  stack  but  no  longer  are.  The  precise  invariant  for  shared  states  should  include  those  garbage  nodes 
(garb).  GN  does  not  affect  the  behaviors  of  the  implementation  and  is  introduced  for  verification  only. 


7  A  Jx,  A.  (stk  =  A)  A  (S  =  a;)  *  ls(2;,  A,  null)  *  garb 
node(a;,  a,  y)  Al  x^{v,y)  node(a;)  node(x,  _,  _) 

ls(x.  A,  j/)  {x  =  yAA  =  tA  emp)  y  {x  A  V  A  3z,  v,  A'.  A  =  v::A'  A  node(2:,  v,  z)  *  ls(2:.  A',  j/)) 

Hx,y)  ’'=  3A.  \s{x,A,y) 

garb  A'  aSg.  (GN  =  S'g)  *  (®j,gSg.node(a;)) 

R^G  =  (Push  V  Pop  V  Id)  *  Id  A  (7  K  7) 

Push  A'  3a;,  j/,  V,  A.  ((stk  =  A)  A  (S  =  y))  cc  ((stk  =  V ::  A)  A  (S  =  a;)  *  node(a;,  u,  y)) 

Pop  Al  3a:,y,u,  A,  5'g.  ((Stk  =  w::A)  A  (S  =  x)  *  node(a:,v,y)  *  (GN  =  5g)) 
cc  ((Stk  =  A)  A  (S  =  y)  *  node(x,  v,  y)  *  (GN  =  SgVJ  {x})) 

Figure  23:  Precise  invariant,  rely  and  guarantee  of  Treiber  stack. 
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The  guarantee  includes  the  push  and  the  pop  actions.  At  the  concrete  side,  the  steps  at  line  8  for 
push  and  line  12  for  pop  in  Figure  [22]  are  the  linearization  points,  i.e.,  they  correspond  to  the  abstract 
atomic  PUSH  and  POP  operations  (thus  the  effect  bits  of  the  actions  are  true!).  Note  that  when  popping 
a  node,  we  also  add  the  node  to  GN.  The  rely  of  a  thread  is  the  same  as  its  guarantee. 

We  show  the  proof  in  Figure  For  linearizability,  we  let  the  abstract  operations  be  executed 
simultaneously  with  the  concrete  code  at  linearization  points.  Note  that  when  popping  from  an  empty 
stack,  the  linearization  point  is  at  line  5  (see  pop  in  Figure [2^,  where  the  thread  reads  the  stack  pointer. 

On  lock- freedom,  we  know  the  failure  of  the  cases  at  line  8  for  push  and  line  12  for  pop  must  be 
caused  by  the  successful  progress  of  other  threads.  In  the  proof,  we  can  increase  the  number  of  tokens 
when  the  environment  updates  the  S  pointer  (i.e.,  the  environment  does  Push  or  Pop),  thus  are  allowed 
to  do  more  loop  iterations. 
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1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


push(v)  { 
local  X,  t,  b; 


{/  A  arem(PUSH(V))  A  v  =  V} 
b  :=  false; 

X  :=  cons(v,  null); 

(^b  A  7  *  node(x,  v,  _)  A  arem(PUSH(V))  A  (v  =  V)) 

V  (b  A  7  A  arem{skip)) 
while  (!b)  ■[ 

{^b  A  I  *  node(x,  v,  _)  A  arem(PUSH(V))  A  (v  =  V)  A  wf(0) } 

<  t  :=  S;  > 
x.next  :=  t; 

lb  A  7  *  node(x,  v,  t)  A  arem(PUSH(V))  A  (v  =  V) 

A  3a.  (S  =  a)  *  true  A  (t  =  a  A  wf(0)  V  t  7^  a  A  wf(l)) 
b  : =  cas (&S ,  t ,  x) ; 

(b  A  7  A  arem(sfcip)  A  wf(l)) 

V  (^b  A  I  *  node(x,  v,  _)  A  arem(PUSH(V))  A  (v  =  V)  A  wf(l)) 


/ /Applying  the  while  rule  and  the  hide-w  rule 


{7  A  arem(sfcip) } 

} 


IntSet  GN; 

//Auxiliary  global  variable  for  verification:  popped  garbage  nodes 

1  popO  { 

2  local  V,  X,  t,  b; 

{7  A  arem(POP) } 

3  b  :=  false; 

{ (^b  A  7  A  arem(POP))  V  (b  A  7  A  arem(sfcip)  A  (v  =  V)) } 

/ /Applying  the  while  rule  and  the  hide-w  rule 

4  while  (!b)  ■[ 

{^b  A  7  A  arem(PQP)  A  wf(0) } 

5  <  t  :=  S;  > 

J  (t  =  null  A  ^b  A  7  A  arerr\{skip)  A  (V  =  EMPTY)  A  wf(l))  1 

V  (^b  A  7  A  arem(POP)  A  3a.  (S  =  a)  *  node(t)  *  true  A  (t  =  a  A  wf(0)  V  t  7^  a  A  wf(l)))  J 

6  if  (t  =  null)  { 

{t  =  null  A  ^b  A  7  A  arem(sfcip)  A  {V  —  EMPTY) } 

7  V  :=  EMPTY; 

8  b  ;=  true; 

{b  A  7  A  arem(sfcip)  A  {v  =  V  —  EMPTY)} 

9  }  else  { 

{  ^b  A  7  A  arem(POP)  A  3a.  (S  =  a)  *  node(t)  *  true  A  (t  =  a  A  wf(0)  V  t  7^  a  A  wf(l)) } 

10  V  :=  t.data; 

11  X  :=  t.next; 

{  ^b  A  7  A  arem(POP)  A  3o.  (S  =  a)  *  node(t,  v,  x)  *  true  A  (t  =  o  A  wf(0)  V  t  7^  o  A  wf(l)) } 

12  <  b  :=  cas(&S,  t,  x)  ;  GN  :=  GNU{t};  > 

{ (b  A  7  A  arem(sfcip)  A  (v  =  V)  A  wf(l))  V  (^b  A  7  A  arem(POP)  A  wf(l)) } 

13  } 

14  > 

{7  A  arem(sfcip)  A  (v  =  V)} 

15  return  v; 

16  } 


Figure  24:  Proof  outline  for  Treiber  stack. 
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4.4  MS  Lock-Free  Queue 


1  deqO  { 


1 

enq(v)  { 

2 

local  V,  s,  h,  t,  b; 

2 

local  X,  t,  s,  b; 

3 

b 

:=  false; 

3 

b  :=  false; 

4 

while  (!b)  { 

1 

ENQ(V)  { 

4 

X  :=  cons(v,  null); 

5 

<  h  :=  Head;  > 

2 

<  Q  :=  Q  :  :  V;  > 

5 

while  (!b)  { 

6 

<  t  :=  Tail;  > 

3 

} 

6 

<  t  :=  Tail;  > 

7 

s  : =  h . next ; 

7 

s  :=  t.next; 

8 

if  (h  =  t)  { 

1 

DEQO  { 

8 

if  (t  =  Tail)  { 

9 

if  (s  =  null)  { 

2 

local  V; 

9 

if  (s  =  null)  •[ 

10 

V  :=  EMPTY; 

3 

<  if  (Q  =  e)  ■[ 

10 

b  :=  cas (&(t .next) ,  s,  x) ; 

11 

b  : =  true ; 

4 

V  :=  EMPTY; 

11 

if  (b)  { 

12 

}  else  { 

5 

}■  else  { 

12 

cas (&Tail ,  t ,  x)  ; 

13 

cas (&Tail ,  t ,  s)  ; 

6 

V  :=  head(Q) 

13 

} 

14 

} 

7 

Q  :=  tail(Q) 

14 

}■  else  { 

15 

}■  else  { 

8 

> 

15 

cas (&Tail ,  t ,  s) ; 

16 

V  :  =  s . val ; 

9 

> 

16 

I 

17 

b  :=  cas(&Head,  h,  s) ; 

10 

return  V ; 

17 

} 

18 

I 

11 

} 

18 

} 

19 

} 

19 

I 

20 

return  v; 

21 

I 

Figure  25:  Variant  of  MS  lock-free  queue. 


In  Figure]^  we  show  a  varianlj^of  Michael-Scott  lock-free  queue  [8]  (at  the  left  of  the  figure)  and  the 
abstract  atomic  operations  (at  the  right).  We  use  our  logic  to  prove  the  linearizability  and  lock- freedom 
together  of  the  MS  queue.  By  similar  arguments  as  for  Treiber  stack  in  Section  |4.3[  here  we  only  need 
to  prove  the  following: 


R,G,I  \-  {I  A  arem(ENQ(V))  A  (v  =  V)}  enq(v)  {/  A  arem(sfcip)} 
and  i?,  G,  /  h  {/  A  arem(DEQ)}  deq  {/  A  arem(sfcip)  A  (v  =  V)} 


We  define  the  precise  invariant  I,  the  rely  R  and  the  guarantee  G  in  Figure  and  show  the  proof  in 
Figures  [27|  and  The  invariant  I  for  the  well-formed  shared  data  structure  is  defined  in  the  same  way 
as  in  linearizability  proofs  (e.g.,  i).  Here  we  introduce  an  auxiliary  variable  GH  to  collect  those  nodes 
which  were  dequeued  from  the  list.  Initially  it  is  set  to  Head,  and  would  not  change  any  more.  Then  the 
list  segment  from  GH  to  Head  includes  all  the  dequeued  nodes. 

The  rely  R  and  the  guarantee  G  contain  three  actions  in  addition  to  identity  transitions:  Enq,  Deq  and 
Swing.  The  actions  Enq  and  Deq  insert  and  remove  a  node  from  the  queue,  and  correspond  to  abstract 
steps  (the  effect  bits  are  true).  The  action  Swing  moves  the  Tail  pointer,  which  does  not  correspond  to 
any  abstract  steps. 

The  proofs  in  Figures  27  and  are  based  on  the  linearizability  proofs  (e.g.,  m  but  also  take  into 
account  the  lock-freedom  propert3^We  need  to  specify  in  the  loop  invariants  (in  both  Figures  and  28 1 


^We  removed  in  deq  the  double  check  on  the  read  of  the  Head  pointer.  As  explained  in  our  previous  work  [6|,  this  double 
check  introduces  a  non-fixed  linearization  point  in  this  queue  algorithm,  but  removing  it  would  not  affect  the  correctness 
of  the  algorithm.  Currently  we  use  a  simplified  setting  and  do  not  support  non-fixed  linearization  points  (since  they  are 
orthogonal  to  our  main  focus  in  this  paper  on  termination  preservation).  We  can  further  extend  the  logic  in  this  paper 
with  the  techniques  for  verifying  linearizability  with  non-fixed  linearization  points  [^,  then  we  would  be  able  to  verify  the 
original  MS  queue  implementation.  Due  to  the  same  reason,  we  remove  the  double  check  in  DGLM  queue  implementation 
as  well. 

^We  actually  found  that  the  lock-freedom  proofs  in  Hoffmann  et  al’s  work  has  bugs  on  computing  the  number  of 
tokens.  The  authors  confirmed  our  finding  in  our  private  communications. 
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I  =  3/i,  t,  A.  (Q  =  A)  A  (Head  —  h)  *  (Tail  =  t)  *  lsq(fe,  t,  A)  *  garb(/i) 

node(a;,  w,  i/)  ='  x^{v,y)  node(a;,i/)  node(a:,  t/)  garb(/i)  3g.  {GE  —  g)  *  \s{g,  h) 

Isq(/i,t,j4)  3v,  A' ,  A" .  {v.:A  =  A' ::  A")  A  \s{h,  A' ,t)  *  t\s{t,  A”) 

ls(a;,  A,y)  =*"  (x  =  yAA  =  tA  emp)  y  {x  ^  y  A  3z,  v,  A' .  A  =  v.\A'  A  node(x,  w,  z)  *  ls(z,  A\  y)) 

Hx,v)  ='  3^-  Hx,A,y) 

\ast2{t,v,x,v')  =  node(i,  w,  a:)  *  node(x,  null)  Iast2(t,  i)  =  Iast2(t,  a;,  _)  Iast2(i)  =  Iast2(t,  _) 

tls(t,  a:,A)  3u,  u'.  (A  =  w  A  node(i,  u,  a;)  A  a:  =  null)  V  (A  =  w ::  v' A  Iast2(i,  u,  a;,  u'))  tls(t,  a;)  =*  3A.  tls(t,  a:,  A) 

R  —  G  “=  {Enq  V  Deq  V  Swing  V  Id)  *  Id  A  (/  K  7) 

Enq  3u,  v' ,  A,  t,  x.  ((Q  =  A)  A  (Tail  =  t)  *  node(t,  v,  null))  a  ((Q  =  A ::  u')  A  (Tail  =  i)  *  Iast2(t,  v,  x,  v')) 

Deq  3u,  A,  h,  t,  x,  y.  ((Q  =  u ::  A)  A  (Head  =  h)  *  node(ft,  x)  *  node(a:,  v,  y)  *  (Tail  =  t)  A  h  ^  t) 

oc  ((Q  =  A)  A  (Head  =  x)  *  node(/i,  x)  *  node(a;,  v,  y)  *  (Tail  =  t)) 

Swing  3u,  v' ,  t,  X.  {emp  A  (Tail  =  t)  *  Iast2(t,  v,  x,  v'))  K  {emp  A  (Tail  =  x)  *  Iast2(i,  v,  x,  v')) 

Figure  26:  Precise  invariant,  rely  and  guarantee  of  MS  lock-free  queue.  The  auxiliary  global  variable  GH 
is  set  to  Head  in  the  initialization  method. 

the  least  number  n  of  tokens  to  execute  the  loops  (i.e.,  the  thread  can  only  run  the  loop  for  no  more 
than  n  rounds  before  it  or  its  environment  fulfills  some  source  steps).  For  instance,  in  the  proof  for  enq 
(Figure [27|),  when  the  Tail  pointer  lags  behind  the  last  node,  we  need  to  have  at  least  two  tokens  to  first 
advance  the  Tail  pointer  in  one  iteration  and  then  enqueue  a  node  in  another  iteration.  Thus  we  define 
tw  (in  Figure  [27|  saying  that  we  have  at  least  two  tokens  if  Tail  lags  behind  and  one  token  otherwise. 

It  is  part  of  our  loop  invariants  in  both  the  proofs  for  enq  and  deq.  Moreover,  to  maintain  this  loop 
invariant,  we  should  get  two  more  tokens  whenever  the  environment  enqueues  a  node  (such  that  the  Tail 
pointer  lags  behind  the  last  node)  and  makes  the  cas  of  the  current  thread  fail. 
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tw(t)  =  (Tail  =  t)  *  ((Iast2(i)  A  wf(2))  V  (node(t,  null)  A  wf(l)))  tw  =  3i.  tw(t) 

tw’(t,  n)  (Tail  =  t)  *  ((Iast2(t,  n)  A  wf(l))  V  (node(/:,  n)  A  n  =  null  A  wf(0))) 
tw’(t)  tw'(i,  _)  tw’  3t.tw'{t) 

newTail(n)  (node(n,  null)  *  (Tail  —  n)  A  wf(l))  V  (Iast2(n)  *  (Tail  =  n)  A  wf(2)) 

V  (3a:,  y.  node(n,  a:)  *  ls(a:,  y)  *  tw(i/)  A  wf(2)) 

readTailEnvAdv(t,  n)  node(t,  n)  *  newTail(n)  readTailEnvAdv(t)  =*  readTailEnvAdv(t,  _) 
readTail(t)  tw'(t)  V  readTailEnvAdv(t) 

readTailNextNullEnv(t,  n)  =*  (n  =  null)  A  ((Tail  =  t)  *  Iast2(t)  A  wf(2))  V  readTailEnvAdv(t)) 
readTailNext(t,  n)  tw'(t,n)  V  readTailEnvAdv(t,  n)  V  readTailNextNullEnv(t,  n) 
readTailNextNull(t,  n)  "=  ((Tail  =  t)  *  node(t,  n)  A  n  =  null  A  wf(0))  V  readTailNextNullEnv(t,  n) 
readTailNextNonnull(t,  n)  ((Tail  =  t)  *  Iast2(t,  n)  A  wf(l))  V  readTailEnvAdv(t,  n) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 
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/ / Applying  the  while  rule  and  the  hide-w  rule 


enq(v)  { 

local  X,  t,  s,  b; 

{/  A  arem(ENQ(V))  A  v  =  V} 
b  :=  false; 

X  :=  cons(v,  null); 

(^b  A  7  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V)) 

V  (b  A  7  A  arem(sfcip)) 
while  (!b)  { 

{-lb  A  (7  A  tw’  *  true)  *  node(x,  v, null)  A  arem(ENQ(V))  A  (v  =  V) } 

<  t  :=  Tail;  > 

{-lb  A  (7  A  readTail(t)  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V) } 
s  : =  t .next ; 

{^b  A  (7  A  readTailNext(t,  s)  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V) } 
if  (t  =  Tail)  { 

{  ^b  A  (7  A  readTailNext(t,  s)  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V) } 
if  (s  =  null)  7 

{  ^b  A  (7  A  readTailNextNull(t,  s)  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V) } 
b  :=  cas(&(t .next) ,  s,  x) ; 

(b  A  7  A  readTailNextNonnull(t,  x)  *  true  A  arem(sA;ip)) 

V  (^b  A  (7  A  readTailNextNullEnv(t,  s)  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V)) 
if  (b)  7 

{b  A  7  A  readTailNextNonnull(t,  x)  *  true  A  arem(sA;ip) } 
cas(&Tail,  t,  x)  ; 

{b  A  7  A  arem(sfcip) } 

} 

(b  A  7  A  arem(sfcip)) 

V  (-lb  A  (7  A  tw  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V)) 

}  else  7 

{^b  A  (7  A  readTailNextNonnull(t,  s)  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V) } 
cas(&Tail,  t,  s)  ; 

{  ^b  A  (7  A  tw  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V) } 

} 

} 

(^b  A  (7  A  tw  *  true)  *  node(x,  v,  null)  A  arem(ENQ(V))  A  (v  =  V)) 

V  (b  A  7  A  arem(sA;ip)) 


{7  A  arem(sfcip) } 


Figure  27:  Proof  outline  for  enq  of  MS  lock-free  queue. 
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readHeadEnv(/i,  n,  a:)  =*  (/i  7^  a:)  A  node(/i,  n)  *  ls(n,  a:)  *  (Head  =  a:) 

readHead(/i,  a:)  ((/i  =  a:)  A  (Head  =  a;))  V  (readHeadEnv(ft,  a;)  *  wf(l))  readHead(/i)  readHead(/i,  _) 

readTailAfterHead(/i,  i)  =*  3x.  readHead(/i,  x)  *  ls(a;,  t)  *  readTail(t) 

readHeadNextAfterTail(/i,  n,  i)  (((Head  =  h)  A  {h  =  t))  *  readTailNext(t,  n)) 

V  ((Head  —  h)  *  node(/i,  n)  *  ls(n,  t)  *  readTail(t)) 

V  (3x.  readHeadEnv(/i,  n,  x)  *  wf(l)  *  ls(x,  t)  *  readTail(t)) 

readHeadNextVal(/i,  n,  v)  ((Head  =  h)  *  node{h,  n)  *  node(n,  v,  _)  *  (Tail  =  n)) 

V  (3x,  t.  (Head  =  h)  *  node(/i,  n)  *  node(n,  v,  x)  *  ls(x,  t)  *  (Tail  =  t)) 

V  (readHeadEnv(/i,  n,  _)  *  tw) 

1  deqO  { 

2  local  V,  s,  h,  t,  b; 

{/A  arem(DEQ) } 

3  b  :=  false; 

{ (^b  A  1  A  arem(DEQ))  V  (b  A  7  A  arem(sfcip)  A  {v  —  V)) } 

/ /Applying  the  while  rule  and  the  hide-w  rule 

4  while  (!b)  { 

{-lb  A  I  A  tw'  *  true  A  arem(DEQ) } 

5  <  h  :=  Head;  > 

{^b  A  I  A  tw'  *  readHead(h)  *  true  A  arem(DEQ) } 

6  <  t  :=  Tail;  > 

{^b  A  I  A  readTailAfterHead(h,  t)  *  true  A  arem(DEQ) } 

7  s  : =  h . next ; 

J  ^b  A  7  A  readHeadNextAfterTail(h,  s,  t)  *  true  1 

(  A  ((h  =  t  A  s  =  null  A  arem(sfcip)  A  V  =  EMPTY)  V  ((h  7^  t  V  s  7^  null)  A  arem(DEQ)))  J 

8  if  (h  =  t)  { 

9  if  (s  =  null)  ■[ 

{^bA7Ah  =  tAs  =  null  A  arem(sfcip)  A  V  =  EMPTY} 

10  V  :=  EMPTY; 

11  b  :=  true; 

{b  A  7  A  arem(sfcip)  A  (v  =  V  =  EMPTY) } 

12  }  else  { 

{  ^b  A  7  A  readHeadNextAfterTail(h,  s,  t)  *  true  A  h  =  t  A  s  7^  null  A  arem(DEQ) } 

{  ^b  A  7  A  readTailNextNonnull(t,  s)  *  true  A  arem(DEQ) } 

13  cas(&Tail,  t,  s) ; 

{  ^b  A  7  A  tw  *  true  A  arem(DEQ) } 

14  } 

15  }  else  { 

{^b  A  7  A  readHeadNextAfterTail(h,  s,  t)  *  true  A  h  7^  t  A  arem(DEQ) } 

16  V  ;=  s.val; 

{  ^b  A  7  A  readHeadNextAfterTail(h,  s,  t)  *  true  A  node(s,  v,  _)  *  true  A  h  7^  t  A  arem(DEQ) } 

{  ^b  A  7  A  readHeadNextVal(h,  s,  v)  *  true  A  arem(DEQ) } 

17  <  b  :=  cas(&Head,  h,  s) ;  > 

{ (^b  A  7  A  tw  *  true  A  arem(DEQ))  V  (b  A  7  A  arem(sfcip)  A  (v  =  V)) } 

18  y 

{ (^b  A  7  A  tw  *  true  A  arem(DEQ))  V  (b  A  7  A  arem(sfeip)  A  (v  =  V)) } 

19  > 

{7  A  arem(sfcip)  A  (v  =  V)} 

20  return  v; 

21  } 


Figure  28:  Proof  outline  for  a  variant  of  deq  in  MS  lock-free  queue. 
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4.5  DGLM  Lock-Free  Queue 


1  enq(v)  ■[ 

2  local  X,  t,  s,  b; 

3  b  :=  false; 

4  X  :=  cons(v,  null); 

5  while  (!b)  { 

6  <  t  :=  Tail;  > 

7  s  : =  t . next ; 

8  if  (t  =  Tail)  { 

9  if  (s  =  null)  { 

10  b  :=  cas (&(t .next) ,  s,  x) ; 

11  if  (b)  { 

12  cas(&Tail,  t,  x) ; 

13  y 

14  }  else  { 

15  casC&Tail,  t,  s)  ; 

16  y 

17  } 

18  } 

19  } 


1  deqO  { 


2 

local  V,  s,  h,  t,  1 

3 

b 

:=  false; 

4 

while  (!b)  { 

5 

<  h  :=  Head;  > 

6 

s  :=  h.next; 

7 

if  (s  =  null)  { 

8 

V  :=  EMPTY; 

9 

b  : =  true ; 

10 

}  else  {. 

11 

V  :  =  s  .  val ; 

12 

b  :=  cas(&Head 

13 

if  (b)  { 

14 

<  t  :=  Tail; 

15 

if  (h  =  t)  i 

16 

cas (&Tail , 

17 

} 

18 

> 

19 

} 

20 

} 

21 

return  v; 

22 

I 

Figure  29:  Variant  of  DGLM  lock-free  queue. 


Doherty  et  al.  [T]  present  an  optimized  version  of  the  deq  method  in  MS  lock-free  queue,  and  verify 
linearizability  of  the  algorithm  by  constructing  a  forward  and  a  backward  simulations.  Here  we  prove  its 
linearizability  and  lock-freedom  together.  We  show  a  varianinof  the  code  in  Figure  29  Its  enq  method 


is  the  same  as  the  MS  lock-free  queue.  For  deq,  it  tests  whether  Tail  points  to  the  sentinel  node  (line  15 
in  Figure  29)  only  after  Head  has  been  updated  (line  12),  while  in  Michael  and  Scott’s  version,  the  test 
(line  8  in  the  deq  of  Figure  [2^  is  performed  before  knowing  the  queue  is  not  empty. 

The  precise  invariant  /  and  the  rely /guarantee  conditions  R  and  G  are  almost  the  same  as  MS  lock- 
free  queue,  as  shown  in  Figure  The  proof  for  enq  is  the  same  as  that  of  MS  lock-free  queue.  In 
Figure  31  we  show  the  proof  of  the  deq  method  for  the  DGLM  queue  using  our  logic.  Different  from 


the  deq  method  of  MS  queue,  here  we  would  not  first  use  one  iteration  to  advance  the  Tail  pointer 
before  dequeuing  nodes  (instead,  only  after  we  have  dequeued  nodes,  we  may  advance  the  Tail  pointer. 


as  shown  at  line  16  of  the  deq  method  in  Figure  29).  Thus  in  the  loop  invariant,  we  no  longer  need  to 


have  at  least  two  tokens  when  Tail  lags  behind  the  last  node.  We  can  just  use  wf(l)  as  the  loop  invariant 
on  the  number  of  tokens,  for  all  cases. 


7  =  3h,  t,  A.  (&Q  l=>  T)  A  (&Head  h)  *  (&Tail  i— t)  *  (lsq(/i,  t,  A)  V  cross(ft,  t,  A))  *  garb(/i) 
cross{h,t,A)  =*  (A  =  e)  A  node(t, /i)  *  node(/i,  null) 

R  —  G  =*  {Enq  V  Deq  V  Swing  V  Id)  *  Id  A  (7  k  7) 

Deq  3u,  A, /i,  a:,  j/.  ((&Q  v ::  A)  A  (&Head  ft)  *  node(/i,  x)  *  node(x,  f,  2/)) 
oc  ((&Q  A)  A  (&Head  i-A-  x)  *  node(ft,  x)  *  node(x,  v,  y)) 


Figure  30:  Precise  invariant,  rely  and  guarantee  of  DGLM  lock-free  queue.  Here  Isq,  garb,  Enq  and  Swing 
are  the  same  as  those  for  MS  queue. 


^As  for  MS  lock-free  queue,  we  also  remove  the  double  check  on  the  read  of  Head  in  the  deq  method  of  DGLM  queue. 
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readHeadNextNullEnv(fe,  n)  =  (n  =  null)  A  3®,  y.  r\ode(h,  x)  *  ((node(x,  y)  *  (&Head  !->■  h))  V  (ls(2:,  y)  *  (&Head  !->■  y))) 
readHeadNext(/i,  n)  (node(fe,  n)  *  (&Head  ha /i))  V  (readHeadEnv(ft,  n,  x)  *  wf(l))  V  readHeadNextNullEnv(/i,  n) 
readHeadNextVal(/i,  n,  w)  ((&Head  ha  ft)  *  node(/i,  n)  *  node(n, «,  _))  V  (readHeadEnv(ft,  n,  x)  *  wf(l)) 
readTailEnvAdv(t,  n)  3x.  {x  ^  t)  A  node(t,  n)  *  ls(n,  x)  *  (&Tail  ha  x) 
readTail(i)  ((&Tail  ha  i)  *  tls(t,  _))  V  readTailEnvAdv(t,  _) 
readLagTail(t,  n)  ((&Tail  ha  t)  *  Iast2(i,  n))  V  readTailEnvAdv(t,  n) 

1  deqO  { 

2  local  V,  s,  h,  t,  b; 

{/A  arem(DEQ) } 

3  b  :=  false; 

{ (^b  A  1  A  arem(DEQ))  V  (b  A  /  A  arem(sfcip)  A  (v  =  V)) } 

/ /Applying  the  while  rule  and  the  hide-w  rule 

4  while  (!b)  •[ 

{^b  A  /  A  arem(DEQ)  A  wf(0) } 

5  <  h  :=  Head;  > 

{-lb  A  7  A  readHead(h)  *  true  A  arem(DEQ) } 

6  s  :=  h.next; 

J  ^b  A  7  A  readHeadNext(h,  s)  *  true  1 

A  ((s  =  null  A  arem(sfcip)  A  V  =  EMPTY)  V  (s  7^  null  A  arem(DEQ)))  J 

7  if  (s  =  null)  ■[ 

{  ^b  A  7  A  s  =  null  A  arem(sfcip)  AV  —  EMPTY  } 

8  V  :=  EMPTY; 

9  b  :=  true; 

{b  A  7  A  arem(sfcip)  A  (v  =  V  =  EMPTY)} 

10  }  else  { 

{  ^b  A  7  A  readHeadNext(h,  s)  *  true  A  (s  7^  null)  A  arem(DEQ)) } 

11  V  :=  s.val; 

{  ^b  A  7  A  readHeadNextVal(h,  s,  v)  *  true  A  arem(DEQ)) } 

12  b  :=  cas(&Head,  h,  s) ; 

{ (b  A  7  A  node(h,  s)  *  node(s,  _)  *  true  A  arem(skip)  A  (v  =  V))  V  (^b  A  7  A  arem(DEQ)  A  wf(l)) } 

13  if  (b)  i 

{b  A  7  A  node(h,  s)  *  node(s,  _)  *  true  A  arem{skip)  A  (v  =  V)} 

14  <  t  :=  Tail;  > 

{b  A  7  A  node(h,  s)  *  node(s,  _)  *  true  A  readTail(t)  *  true  A  arem(sfcip)  A  (v  =  V) } 

15  if  (h  =  t)  { 

{b  A  7  A  readLagTail(t,  s)  *  true  A  arem(sfcip)  A  (v  =  V) } 

16  cas(&Tail,  t,  s) ; 

17  > 

{b  A  7  A  arem(skip)  A  (v  =  V) } 

18  } 

19  > 

{ (^b  A  7  A  arem(DEQ)  A  wf(l))  V  (b  A  7  A  arem(sA;ip)  A  (v  =  V)) } 

20  > 

{7  A  arem(sfcip)  A  (v  =  V)} 

21  return  v; 

22  } 

Figure  31:  Proof  outline  for  a  variant  of  deq  in  DGLM  lock-free  queue.  Here  readHead  and  readHeadEnv 
are  the  same  as  those  for  MS  queue. 
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4.6  Synchronous  Queue 


1  initializeO  { 


2 

local  sentinel; 

1 

deqO  -C 

3 

sentinel  :=  new  NodeCnull,  DATA,  null); 

2 

local  t,  h,  n,  req,  b,  v; 

4 

GH  :=  Head  :=  Tail  :=  sentinel; 

3 

b  :=  false; 

5 

} 

4 

req  :=  new  NodeCnull,  REQ,  null); 

5 

while  (!b)  { 

1 

enq(v)  i 

6 

t  :=  Tail; 

2 

local  t,  h,  n,  offer,  b,  v’ ; 

7 

h  :=  Head; 

3 

b  :=  false; 

8 

if  (h  =  t  II  t.type  =  REQ)  { 

4 

offer  :=  new  Node(v,  DATA,  null); 

9 

n  : =  t . next ; 

5 

while  (!b)  ■[ 

10 

if  (t  =  Tail)  { 

6 

t  :=  Tail; 

11 

if  (n  !=  null)  { 

7 

h  :=  Head; 

12 

cas(&Tail,  t,  n) ; 

8 

if  (h  =  t  II  t.type  =  DATA)  { 

13 

}  else  if  (cas (&(t .next) ,  n,  req)){ 

9 

n  :=  t.next; 

14 

cas(&Tail,  t,  req); 

10 

if  (t  =  Tail)  { 

15 

V  :=  req. data; 

11 

if  (n  !=  null)  { 

16 

while  (v  =  null)  ■[  v  :=  req. data; 

12 

cas(&Tail,  t,  n) ; 

17 

h  :=  Head; 

13 

}■  else  if  (cas(&(t  .next)  ,  n,  offer))! 

18 

if  (req  =  h.next) 

14 

cas(&Tail,  t,  offer); 

19 

cas(&Head,  h,  req); 

15 

v’  :=  offer. data; 

20 

b  : =  true ; 

16 

while  (v’  =  v)  {  v’  :=  offer. data;  J 

21 

} 

17 

h  :=  Head; 

22 

} 

18 

if  (offer  =  h.next) 

23 

}■  else  { 

19 

cas(&Head,  h,  offer); 

24 

n  :=  h.next; 

20 

b  : =  true ; 

25 

if  (t  =  Tail  &&  h  =  Head  &&  n  !=  null) 

21 

J 

26 

V  :=  n.data; 

22 

} 

27 

if  (v  !=  null)  { 

23 

}  else  { 

28 

b  :=  cas(&(n.data) ,  v,  null); 

24 

n  :=  h.next; 

29 

} 

25 

if  (t  =  Tail  &&  h  =  Head  &&  n  !=  null)  { 

30 

cas (Head,  h,  n) ; 

26 

b  :=  cas(&(n.data) ,  null,  v) ; 

31 

if  (b)  free (offer); 

27 

cas(Head,  h,  n) ; 

32 

} 

28 

if  (b)  free(offer); 

33 

J 

29 

} 

34 

} 

30 

} 

35 

return  v; 

31 

} 

36 

} 

32 

} 

Figure  32:  Synchronous  dual  queue.  Here  GH  is  an  auxiliary  variable. 


A  synchronous  queue  is  a  concurrent  transfer  channel  in  which  each  producer  presenting  an  item  must 
wait  for  a  consumer  to  take  this  item,  and  vice  versa.  We  show  the  implementation  of  synchronous  queue 
(used  in  Java  6  [5])  in  Figure It  is  based  on  the  Michael-Scott  queue.  At  any  time,  the  queue  contains 
either  enq  reservations  (nodes  whose  type  fields  are  DATA),  deq  reservations  (nodes  whose  type  fields 
are  REQ),  or  it  is  empty.  In  the  enq  method  (also  known  as  put),  a  thread  first  checks  if  the  queue  is 
empty  or  contains  DAT  A- type  reservations  (line  8  in  enq  in  Figure  [3^.  If  so,  it  enqueues  (puts  in)  its  new 
DATA-type  reservation  (lines  13  and  14  in  enq),  and  waits  at  the  item  for  a  deq  thread  to  take  it  (lines  15 
and  16  in  enq).  When  a  deq  thread  finds  this  reservation,  it  will  take  away  the  data  contained  in  the 
item  (line  26  in  deq),  set  the  data  field  to  null  (line  28  in  deq)  and  remove  this  item  (line  30  in  deq). 
Also  when  the  waiting  enq  thread  finds  that  the  item  has  been  taken,  it  can  try  to  remove  the  item  as 
well  (lines  18  and  19  in  enq).  Symmetrically,  a  deq  thread  first  checks  if  the  queue  is  empty  or  contains 
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REQ-type  reservations  (line  8  in  deq),  and  if  so,  it  enqueues  (puts  in)  its  new  REQ-type  reservation  (lines  13 
and  14  in  deq),  and  waits  for  a  enq  thread  to  fulfill  it  (lines  15  and  16  in  deq). 

The  synchronous  queue  does  not  satisfy  the  traditional  linearizability  definition  [?].  But  we  can  see 
that  the  steps  for  a  thread  to  put  in  its  reservation  (which  are  actually  like  the  enq  method  in  MS  queue 
in  Figure  [2^  are  “linearizable”  and  “lock- free”  (in  that  the  multiple  steps  can  be  abstracted  as  an  atomic 
operation),  and  the  steps  for  taking  away  the  data  or  fulfilling  the  reservation  (which  are  like  the  deq 
method  in  MS  queue)  are  also  “linearizable”  and  “lock-free” .  The  waiting  steps  are  certainly  not  “lock- 
free”  which  require  interactions  from  other  threads  to  progress.  We  can  define  non-atomic  abstract  code 
and  prove  that  the  synchronous  queue  implementation  refines  it. 


1  ENQ(V)  { 

2  local  nd,  mustWait,  va; 

3  <  nd  : =  dequeue (D) ; 

4  mustWait  :=  (nd  =  null); 

5  if  (mustWait)  {  nd  :=  enqueue(E,  V);  } 

6  > 

7  if  (mustWait)  { 

8  va  :=  nd.data; 

9  while (va  =  V)  {  va  :=  nd.data;  } 

10  } 

11  else  { 

12  nd.data  :=  V; 

13  } 

14  } 


1  DEQO  { 

2  local  nd,  mustWait,  V; 

3  <  nd  : =  dequeue (E) ; 

4  mustWait  :=  (nd  =  null); 

5  if  (mustWait)  {  nd  :=  enqueue(D,  null);  }■ 

6  > 

7  if  (mustWait)  { 

8  V  :=  nd.data; 

9  while(V  =  null)  {  V  :=  nd.data;  } 

10  } 

11  else  { 

12  V  :=  nd.data; 

13  nd.data  :=  null; 

14  } 

15  return  V; 

16  } 


Figure  33:  Abstract  synchronous  queue. 


As  shown  in  Figure]^  the  abstract  code  follows  Java  SE  5.0  SynchronousQueue  class  We  maintain 
two  abstract  queues:  D  for  waiting  dequeuers  and  E  for  waiting  enqueuers.  Each  queue  is  a  mathematical 
list  of  node  addresses  (as  an  abstraction/simplification  of  a  linked  list).  The  command  enqueue (E,  v) 
allocates  a  new  abstract  node  with  data  v  and  inserts  its  address  at  the  tail  of  the  queue  E,  and  returns 
the  address.  The  command  dequeue  (E)  removes  the  first  item  (a  node  address)  from  the  queue  E  and 
returns  it  if  E  is  not  empty  (E  ^  e),  and  returns  null  otherwise. 

In  the  ENQ  method,  a  thread  first  checks  if  D  is  empty  (line  4  of  ENQ  in  Figure  33),  and  if  so,  it 


atomically  puts  in  its  reservation  to  E  (line  5).  Then  it  waits  for  a  deq  thread  to  take  away  the  data  in 
the  reservation  (lines  8  and  9) .  If  D  is  not  empty,  then  it  dequeues  a  reservation  from  D  and  writes  its 
enqueued  value  V  to  the  data  field  of  the  reservation  (line  12).  The  DEQ  method  is  symmetric. 

To  simplify  the  proof,  we  assume  the  abstract  state  always  contain  a  dummy  node  whose  data  is 
null.  The  node  is  never  accessed  by  the  code.  It  is  used  to  correspond  to  the  initial  sentinel  node  of  the 
concrete  list. 

To  prove  the  concrete  implementation  in  Figure  [3^  refines  the  abstract  operations  in  F igure  [33|  using 
our  logic,  we  first  define  the  invariant  /  and  the  rely  and  guarantee  conditions  R  and  G  in  Figure  |3'4| 

The  invariant  I  says,  the  shared  memory  contains  the  queue  Q  and  some  garbage  nodes  Garb  which 
were  removed  from  the  queue  by  either  enq  or  deq.  As  usual  we  introduce  an  auxiliary  variable  GH  to 
collect  those  nodes  which  were  removed  from  the  list.  Initially  it  is  set  to  Head,  and  would  not  change  any 
more.  Then  the  list  segment  (GIs)  from  GH  to  Head  includes  all  the  removed  nodes.  Also  these  removed 
nodes  must  have  been  sentinel  nodes  (stnl),  i.e.,  those  DATA-type  nodes  whose  data  has  been  taken  and 
those  REQ-type  nodes  whose  data  has  been  fulfilled.  The  queue  Q  is  either  a  DATA-type  queue  (and  the 
abstract  D  must  be  empty)  or  a  REQ-type  queue  (and  the  abstract  E  must  be  empty).  And  it  always 
contains  one  or  two  sentinel  nodes  (the  two-sentinel  case  occurs  since  the  Head  pointer  may  lag  behind 
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the  new  sentinel  node).  Also  as  in  MS  queue,  the  Tail  pointer  may  lag  behind  the  last  node.  But  if  Head 
lags  behind  the  new  sentinel  node,  Tail  would  not  be  equal  to  Head,  as  indicated  by  the  implementation 
in  Figure 

The  rely  and  guarantee  conditions  contain  six  possible  actions  in  addition  to  the  identity  transitions. 
AdvHead  and  AdvTail  are  to  swing  the  Head  and  Tail  pointers  when  they  lag  behind.  These  two  actions 
do  not  correspond  to  any  abstract  step.  ResvE  and  ResvD  each  inserts  a  new  node  at  the  tail  of  the 
queue.  Put  fulfills  the  data  field  of  a  REQ-type  node  at  the  head  of  the  queue,  and  Take  takes  away  the 
data  of  a  DATA-type  node.  They  both  make  a  normal  node  into  a  sentinel  node.  The  four  actions  ResvE, 
ResvD,  Put  and  Take  correspond  to  abstract  steps  and  thus  their  effect  bits  must  be  true. 

We  show  the  proofs  of  enq  in  Figures  37  and  38  with  some  auxiliary  predicates  defined  in  Figures  [35| 
and  |36|  Proofs  for  deq  is  symmetric  and  omitted  here.  Similar  to  the  proofs  for  MS  queue,  we  need  to 
specify  in  the  loop  invariants  the  least  number  n  of  tokens  to  execute  the  loops  (i.e.,  the  thread  can  only 
run  the  loop  for  no  more  than  n  rounds  before  it  or  its  environment  fulfills  some  source  steps).  In  the 
proof  for  enq  (Figure  [T7|,  when  either  the  Head  or  the  Tail  pointer  lags  behind,  we  need  to  have  at  least 
two  tokens  (as  defined  by  loopinv  in  Figure [3^.  To  maintain  this  loop  invariant,  we  should  get  two  more 
tokens  whenever  the  environment  inserts  a  node  at  the  tail  (such  that  the  Tail  pointer  lags  behind  the 
last  node),  and  whenever  the  environment  makes  a  normal  node  becomes  a  sentinel  node  (such  that  the 
Head  pointer  lags  behind  the  new  sentinel). 
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I  =  3/i,  t.  (Head  =  /i)  *  (Tail  =  t)  *  Q{h,  t)  *  Garb(/i) 

Q{h,t)  =  3h.  Q,{h,t) 

Qi,(/i,  t)  =  3L.  Qi,(/i,  t,  L)  *  ((&  =  DATA  A  (E  =  L)  *  (D  =  e))  V  (&  =  REq  A  (D  =  L)  *  (E  =  t))) 

Qi,{h,t,L)  Ssi,(/i,  t,null)  A  L  =  e 

V  Bx,  X.  Ssi,{h,t,x)  *  Qni,(a;,null,X)  A  L  =  X ::  e 

V  3x,  L',  L”.  Ssb{h,  *)  *  Qls6(a:,  t,  L')  *  L")  A  L  =  L' L” 

Garb(/i)  =*  3g.  (GH  =  g)  *  Gls((;,  h) 

Ss{x,y,z)  3b.  Ssb{x,y,  z)  Ssb{x,  y,z)  ’^=  (Stnl(®,  a)  A  (a:  =  t/))  V  (Stnl(a:,  j/)  *  Stnl6(2/,  a)) 

G\s{x,y)  ’^=  {x  =  y)  y  {x  ^  y  A3z.  Str\\{x,  z)  *  G\s{z,y)) 

Qlsi,(a;,  y,L)  =  {x  =  yAL  —  t)y(x^yA  3z,  X,  L' .  L  =  X  ::L'  A  Qnb(x,  z,  X)  *  Qlsb(z,  y,  L')) 

Qt\bix,y,L)  =  {3X.  Qnb{x,y,X)  Ay  ^  null  A  L  =  X::e) 

V  (3X,  y.  Qni,(a:,  y,  X)  *  Q^biy,  null,  Y)  A  L  =  X -.-.Y  y.  t) 

Stnl(a;,i/)  =*^  36.  Stnl6(x,  j/,  _)  Stnlj,(a;,  u,  y,  X)  stnl6(a;,  u,  j/)  A  NODE(X,  u) 

Qnb{x,y,X)  =  Qnb(x,-,y,X)  Qnb(x,v,y,X)  =  qni,(a;,  w,  j/)  A  NODE(X,  u) 

stnl6(a;,  w,  j/)  =*^  node6(a;,  u,  j/)  A  ((6  =  DATA  A  u  =  null)  V  (6  =  REQ  A  u  7^  null)) 

qrij,)®,  w,  j/)  nodes)®,  w,  j/)  A  ((6  =  DATA  A  u  /  null)  V  (6  =  REQ  A  u  =  null)) 

nodes)®,  w,y)  *=  xi-^{v,b,y)  NODE(X,  y)  X  1=^  (y) 

stnl(®,2/)  3b.  stnls(®, -,  3/)  qn(®,  y)  3b.  qns(®, -,  j/)  node)®,  3/)  =*  36.  nodes)®, -,  3/) 

stnls(®,3/)  '=  stnis)®, -,  3/)  nodes)®,  3/)  '=  nodes)®,  _,  33)  node)®, «,  33)  3b.  nodes)®,  u,  33) 

R  =  G  "=  (AdvHead  V  AdvTail  V  ResvE  V  ResvD  V  Put  V  Take  V  Id)  *  Id  A  (/  k  7) 

AdvHead  3®,  33,  2,  s.  [stnl(®,  33)  *  stnl(33,  2)  A  emp]  *  ((Head  =  ®)  k  (Head  =  33)) 

AdvTail  3®,  33.  [node)®,  33)  *  node(33,  null)  A  emp]  *  ((Tail  =  ®)  k  (Tail  =  33)) 

ResvE  =*  3v,  v' ,  6,  t,  ®,  L,  X.  ((Tail  =  t)  *  nodes)/:,  v,  null)  A  (E  =  L)  *  (D  =  e)) 

oc  ((Tail  =  t)  *  nodes)/,  v,  ®)  *  qn^j^j^)®,  ®',null)  A  (NODE(X,  v')  *  (E  =  L::  X)  *  (D  —  e))) 

ResvD  =*  3®,  v' ,  b,  /,  ®,  L,  X.  ((Tail  =  /)  *  nodes)/,  v,  null)  A  (E  =  e)  *  (D  =  L  j) 

oc  ((Tail  =  /)  *  nodes)/,®,®)  *  qn^Eq)®: 'y^ null)  A  (NODE(X, ®')  *  (E  =  e)  *  (D  =  L::X))) 

Put  =*  3h,  /,  ®,  33,  X,  L.  [(Head  =  h)  *  (Tail  =  /)  *  Stnl)/),,  ®)  *  (E  =  e)  A  (b,  7^  /)] 

*  ((QnREq)*,  33,  X)  *  (D  =  X  ::  L))  oc  (StnlaEq)®,  33,  X)  *  (D  =  L)) 

Take  =*  3b,  /,  ®,  33,  X,  L.  [(Head  =  b)  *  (Tail  =  /)  *  Stnl(b,  ®)  *  (D  =  e)  A  (b  7/  /)] 

*  ((QnDATA(a;,3/,-^)  *  (E  =  X  ::L))  oc  (StnloATA)®,  3/, -’f )  *  (E  =  L)) 

Figure  34:  Precise  invariant,  rely  and  guarantee  of  synchronous  queue.  Here  we  use  Ei  =  E2  and  Ei  =  E2 
short  for  {Ei  =  E2)  A  emp  and  (Ei  =  E2)  A  emp  respectively. 
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node2p(i,  n,  a;)  =*  nodep(i,  n)  *  node(n,  a:)  node2(t,  n,  a:)  3p.  node2p(t,  n,  a;) 

stnl2p(/i,  n,  u)  =*  stnl(^,  n)  *  stnlp(n,  w,  _)  stnl2p(/i)  stnl2p(/i,  stnl2(/i)  =*  3p.  stnl2p(/i) 

stnllp(/i,  n,  u)  =*  stnl(/i,  n)  *  qnp(n,  w,  _)  stnllp(/i)  stnllp(/i,  stnll(/!,)  =*  3p.  stnllp(/i) 

gls(a;,y)  =  {x  =  y) -V  {x  ^  y  A3z.  stn\{x,  z)  *  g\s{z,y)) 

\s{x,y)  ='  {x  =  y)  V  {x  ^  y  A3z.  'r\ode{x,  z)  *  \s{z,y)) 

lagTail  node2(Tail,  null)  nonlagTail  =*  node(Tail,  null)  tail  lagTail  V  nonlagTail 
lagHead  =*  stnl2(Head)  nonlagHead  stnl(Head,  null)  V  stnll(Head)  head  =*  lagHead  V  nonlagHead 
loopinv  =*  ((lagTail  V  lagHead)  A  wf(2))  V  (nonlagTail  A  nonlagHead  A  wf(l)) 
loopBody  =*  ((lagTail  V  lagHead)  A  wf(l))  V  (nonlagTail  A  nonlagHead  A  wf(0)) 

newTailp(n,  u)  =*  (nodep(n,  u,  null)  A  (n  =  Tail)  A  wf(l)) 

V  (3a;.  nodep(n,  v,  x)  *  node(a;,  null)  A  (n  =  Tail)  A  wf(2)) 

V  (3a;.  nodep(n,  u,  a;)  *  ls(a;,  Tail)  *  tail  A  wf(2)) 

newTail(n)  3j3,  u.  newTailp(n,  w)  NewTailp(n,  u,  A^)  newTailp(n,  w)  *  NODE(A'^,  u) 

readTailEnvAdVp_q(t,  n,  u)  =*  nodep(t,  n)  *  newTailq(n,  u) 

readTailEnvAdVp(t)  =*  3q.  readTailEnvAdVp^g(t,  _,  _)  readTailEnvAdVp(t,  n)  =*  3g.  readTailEnvAdVp_p(t,  n,  _) 
readTailp(t)  =*  (t  =  Tail  A  (node2p(t,  null)  V  nodep(t, null)))  V  readTailEnvAdVp(t) 
readTailNextNullEnVp(t,  n)  (n  =  null)  A  {{t  =  Tail  A  node2p(t,  null)  A  wf(2))  V  readTailEnvAdVp(t)) 

readTailNextp(t,  n)  {t  —  Tail  A  (node2p(t,  n,  null)  V  (nodep(t,  n)  An  =  null))) 

V  readTailEnvAdVp(t,  n)  V  readTailNextNullEnVp(t,  n) 

readTailNextNonnullp(t,  n)  (t  =  Tail  A  node2p(t,  n,  null)  A  wf(l))  V  readTailEnvAdVp(t,  n) 
readTailNextNullp(t,  n)  =*  (t  =  Tail  A  nodep(t,  n)  A  n  =  null  A  wf(0))  V  readTailNextNullEnVp(t,  n) 
EnvXchgg(n,  v,  N)  3a;.  Stnlq)^,,  v,  x,  N)  *  ls(a;,  Tail)  *  tail  A  (stnl(Head,  n)  V  gls(n,  Head)) 
EnvXchgReadHead^(n,  v,  N,  h)  =*  3a;.  Stnlq(n,  u,  x,  N)  *  ls(a;,  Tail)  *  tail  A  (stnl(/i,  n)  V  gls(n,  h))  A  gls(/i,  Head) 
EnvXchgLagHeadg(n,  v,  N,  h)  3a;.  Stnlq(n,  v,  x,  N)  *  ls(a;,  Tail)  *  tail  A  stnl(/i,  n)  A  gls(/i,  Head) 
EnvXchgNonlagHeadg(n,  u,  A'^)  3a;.  Stnlq(n,  u,  a;,  A'^)  *  ls(a;,  Tail)  *  tail  A  gls(n,  Head) 

Res\/q{t,n,v,v' ,  N)  =*  (t  =  Tail  /\  noc]0^i,  TT-)  ^  Q  V\q(yTh^  U  ^  I1U.H ,  ) 

V  node(t,  n)  *  NewTailg(n,  i?,  A^)  V  node(t,  n)  *  EnvXchgg(n,  A^) 

ResvAdv5(n,  u,  u',  A^)  =*  NewTailq(n,  u,  A'^)  V  EnvXchgg(n,  w',  A^) 

ResvAdvReadDataq(n,  w,  w',  Ur,  A^)  NewTailq(n,  u,  A^)  A  (ur  =  u)  V  EnvXchgg(n,  u',  A^)  A  (ur  =  u' V  Ur  =  u) 

ENqWait  ='  (va  :=  nd.data;  ENQWhile) 

ENQWhile  (while (va=V){  va  :=  nd.data;  }■) 

Figure  35:  Auxiliary  definition  -  I. 
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newHeadp(n,  v)  (stnlp(n,  u,  null)  A  (n  =  Head)  A  wf(l)) 

V  (3a;.  stnlp(n,  v,  x)  *  qn(a;,  -)  A  (n  —  Head)  A  wf(l)) 

V  (3a:.  stnlp(n,  v,  x)  *  stnl(a;,  J)  A  {n  —  Head)  A  wf  (2)) 

V  (3a;.  stnlp(n,  v,  x)  *  gls(a:,  Head)  *  head  A  wf(2)) 

newHead(n)  3p,  w.  newHeadp(n,  u) 

readHeadEnvAdVp(/i,  n,  w)  stnl(/i,  n)  *  newHeadp(n,  u) 

readHeadEnvAdVp(/i)  =*  readHeadEnvAdVp(/i,  _,  _)  readHeadEnvAdVp(/i,  n)  =*  readHeadEnvAdVp(/i,  n,  _) 
readHeadp(/i)  (fe  =  Head  A  (stnlp(/i,  null)  V  stnllp(/i)  V  stnl2p(/i)))  V  readHeadEnvAdVp(/i) 

readHeadNextNullEnVp(/i,  n)  =*  (n  =  null)  A  ((k  —  Head  A  stnlp(/i,  x)  *  node(a;,  _)  A  wf(2))  V  readHeadEnvAdVp(/i)) 

readHeadNextp(/i,  n)  =*  {h  —  Head  A  ((stnlp(/i,  n)  An  —  null)  V  stnllp(/i,  n,  _)  V  stnl2p(/i,  n,  _))) 

V  readHeadEnvAdVp(/i,  n)  V  readHeadNextNullEnVp(/i,  n) 

readHeadNextNonnullp(/i,  n)  =*  (/i  =  Head  A  (stnllp(/i,  n,  _)  V  stnl2p(/i,  n,  _)))  V  readHeadEnvAdVp)/),,  n) 

readHeadNextNullp(/i,  n)  (/i  =  Head  A  stnlp(/i,  n)  A  n  =  null)  V  readHeadNextNullEnVp(/i,  n) 

Xchgp(/i,  n,  w)  (/i  =  Head  A  stnl2p(/i,  n,  w))  V  readHeadEnvAdVp(/i,  n,  w) 

Xchgp(/i,n)  Xchgp(/i,n,  _) 


Figure  36:  Auxiliary  definition  -  II. 
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1  enq(v)  -[ 

2  local  t,  h,  n,  offer,  b,  v’ ; 

{/A  loopinv  *  true  A  arem(ENQ) } 

3  b  :=  false; 

4  offer  :=  new  Node(v,  DATA,  null); 

{ (^b  A  (7  A  loopinv  *  true)  *  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ))  V  (b  A  7  A  arem(sA;ip)) } 

5  while  (!b)  •[ 

{(7  A  loopBody  *  true)  *  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ)  A-ib} 

6  t  :=  tail; 

{  3p.  (Qp  *  Garb  A  loopBody  *  true  A  readTailp(t)  *  true)  *  nodeoAiA (offer,  v,  null)  A  arem(ENQ)  A  “^b} 

7  h  :=  head; 

3p.  (Qp  *  Garb  A  loopBody  true  A  readT'ailp(t)  ^  true  A  readHeadp(h)  ^  true) 

*  nodeDATA(of f er,  v,  null)  A  arem(ENQ)  A  ^b 
if  (h  =  t  II  t.type  =  DATA)  { 
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{3p.  (7  A  loopBody  *  true  A  readTailp(t)  *  true  A  gls(h.  Head)  *  true) ) 
*  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ)  A  (h  =  t  V  p  =  DATA)  A  ^b  J 


n  :=  t.next; 

3p.  (7  A  loopBody  *  true  A  readTailNextp(t,  n)  *  true  A  gls(h,  Head)  *  true) 

*  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ)  A  (h  =  t  V  p  =  DATA)  A  “^b 
if  (t  =  tail)  { 

3p.  (7  A  loopBody  *  true  A  readTailNextp(t,  n)  *  true  A  gls(h.  Head)  *  true) 

*  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ)  A  (h  =  t  V  p  =  DATA)  A  ^b 
if  (n  !=  null)  { 

3p.  (7  A  loopBody  *  true  A  readTailNextNonnullp(t,  n)  *  true) ' 

*  nodeDATA(of  f  er,  V,  null)  A  arem(ENQ)  A  ^b 
casC&tail,  t,  n) ; 

{ (7  A  loopinv  *  true)  *  nodeoATA (offer,  v,  null)  A  arem(ENQ)  A  ^b} 

}  else  { 

3p.  (7  A  loopBody  *  true  A  readTailNextNullp(t,  n)  *  true  A  gls(h,  Head)  *  true)  1 

*  nodeDATA  (offer,  v,  null)  A  arem(ENQ)  A  (h  =  t  V  p  =  DATA)  A  ^b  J 

if  (cas (&(t .next) ,  n,  offer)){ 

{  (7  A  ResVDATA(t,  offer,  v,  null,  nd)  *  true)  A  arem(ENQWait)  A  ^b} 
casC&tail,  t,  offer); 

{ (7  A  ResvAdvDATA(of  f  er,  v,  null,  nd)  *  true)  A  arem(ENQWait)  A  ^b} 
v’  :=  offer. data; 

{ (7  A  ResvAdvReadDataDATA(of f er,  v,  null,  v’ ,  nd)  *  true)  A  (v’  =  va)  A  arem(ENQWhile)  A  ^b} 
while  (v’  =  v)  {  v’  :=  offer. data;  } 

{(7  A  EnvXchg;,„j(of  f  er,  null,  nd)  *  true)  A  (v’  =  va  =  null)  A  arem(sfcip)  A  ^b} 
h  :=  head; 

{ (7  A  EnvXchgReadHeado4„(off er,  null,  nd,  h)  *  true)  A  arem(sfcip)  A  ^b} 
if  (offer  =  h.next) 

{ (7  A  EnvXchgLagHeadn„A(°^f  null,  nd,  h)  *  true)  A  arem(sfcip)  A  ^b} 
cas(&head,  h,  offer); 

{ (7  A  EnvXchgNonlagHeadp„j(of f er,  null, nd)  *  true)  A  arem(sfcip)  A  ^b} 
b  : =  true ; 

{b  A  7  A  arem(sfcip) } 


Figure  37:  Proof  outline  -  I. 
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else  { 

3p.  (/  A  loopBody  *  true  A  readTailp(t)  *  true  A  readHeadp(h)  *  true)  1 

*  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ)  A  (h  7^  t  A  p  =  REQ)  A  J 

:=  h.next; 

3p.  (/  A  loopBody  *  true  A  readTailp(t)  *  true  A  readHeadNextp(h,  n)  *  true) ) 

*  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ)  A  (h  7^  t  A  p  =  REQ)  A  J 

if  (t  =  tail  &&  h  =  head  &&  n  !=  null)  { 

(J  A  loopBody  *  true  A  readHeadNextNonnullREQ(h,  n)  *  true) 

*  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ)  A  ^b 
I  :=  cas(&(n.data) ,  null,  v) ; 

b  A  (/  A  loopBody  *  true  A  Xchg[(j.q(h,  n,  v)  *  true)  *  nodeDATA(of  f  er,  v,  null)  A  arem(sfcip)  1 

V  ^b  A  (/  A  loopBody  *  true  A  Xchg^£g(h,  n)  *  true)  *  nodeDATA(off  er,  v,  null)  A  arem(ENQ)  J 
casChead,  h,  n) ; 

(b  A  /  *  nodeDATA (offer,  v,  null)  A  arem(sfeip)) 

V  (^b  A  (/  A  loopinv  *  true)  *  nodeoAiA (offer,  v,  null)  A  arem(ENQ)) 
if  (b)  free(offer); 

{ (^b  A  (7  A  loopinv  *  true)  *  nodeDATA(of  f  er,  v,  null)  A  arem(ENQ))  V  (b  A  7  A  arem(sfcip)) } 

}  else  ■[ 

(7  A  loopBody  *  true  A  (readTailEnvAdvRE()(t)  V  readHeadEnvAdvRE(3(h)  V  readHeadNextNullEnvREQ(h,  n))  *  true) 

*  nodeDATA(of f er,  v, null)  A  (h  7^  t)  A  arem(ENQ)  A  ^b 
{^b  A  (7  A  loopinv  *  true)  *  nodeDATA(of f er,  v,  null)  A  arem(ENQ) } 


{7  A  arem(sfcip) } 


36  } 


Figure  38:  Proof  outline  -  11. 
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5  Soundness  Proofs 


Below  we  first  prove  the  adequacy  of  RGSim-T  w.r.t.  the  termination-sensitive  refinement  (Section  |5.1[). 
Then  we  define  the  unary  judgment  semantics  (Section  |5.2[),  and  we  prove  the  soundness  of  the  binary 
inference  rules  of  Figure  (Section  |5.3[),  where  the  binary  judgment  semantics  is  just  RGSim-T  in 
Definition]^  and  also  prove  the  soundness  of  the  unary  rules  of  Figure]^  (Section  5.4).  Finally  we  show 
the  derivation  of  the  while-term  rule  (Section  |5.5[). 


5.1  Adequacy  of  RGSim-T 

RGSim-T  in  Definition^  (which  is  also  the  binary  judgment  semantics)  implies  the  termination-sensitive 
refinement  in  Definitiorin] 

Theorem  4  (Adequacy  of  RGSim-T).  If  there  exist  R,  G,  I,  Q  and  a  metric  M  such  that  R,G,I  |= 
(C,  CT,  M)  (C,  E),  then  {G,  a)  C  (C,  E). 

Proof:  We  want  to  prove  the  following:  for  any  R,  G,  /,  Q, 

VC,E,£:. 

{3G,a,M.  R,G,I  \=  {G,a,M)^Q{C,J:)  A  ETr{C,a,£))  ETr{C,T.,£) 

By  co-induction. 

Co-induction  Principle:  Vx.  {3S.  S  C  F{S)  A  x  G  S')  x  G  gfp  F 
Figure  [^defines  F  and  gfp  F  (i.e.,  FTr).  Let 

S  =  {(C,E,S)  I  3G,a,M.  i?,  G,  /  |=  (G  ct,  M)  (C,  E)  A  FTr{G,a,£)}. 

So  from  the  co-induction  principle,  we  only  need  to  prove: 

S  C  F{S),  i.e.,  VC,  E,  S.  (C,  E,  S)  G  S  ^  (C,  E,  £)  G  F{S) . 

After  unfolding  S,  we  only  need  to  prove: 


VM,C,E,£:,G,cr.  R,G,/ h  (C><7,M)^q(C,E)  A  S2r(G,CT,S)  ^  (C,  E,  S)  G  F(S) . 


(5.1) 


By  transfinite  induction  over  M . 


Transfinite  Induction  Principle:  (VM.  (VMA  M'  <  M  P{M'))  P{M) )  \/M.P{M) 


We  view  (5.1 1  as  VM.P(M).  So  we  only  need  to  prove: 

VM. 

(VM'.  M'  <M 

=>  (VC',E',S',G',tT'.  R,G,I\=  (G',tT',M')^Q(C',E')  A  FTr{G' ,(j' ,£') 
(C',S',£')€F(S))) 

(VC,  E,  £,  G,  a.  R,  G,  I  \=  (G,  cr,  M)  (C,  E)  A  FTr{C,  a,  £) 
(C,E,S)gF(S)) 

By  inversion  over  FTr{C^(j,£), 


1.  (G,  cr)  — >■*  (skip,  cr')  and  £  =1): 

From  R,G,I  \=  (G,  a,  M)  (C,  E),  we  know  there  exists  E'  such  that  (C,  E)  — >*  {skip,  E'). 
Thus  from  the  definition  of  F  (Figure]^,  we  know  (C,E,S)  G  F{S). 
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2.  (C, a)  — !>  +  abort  and  £  =  i'- 

From  R,G,I  ^  (C,  a,  M)  :<q  (C,  S),  we  know  (C,  S)  — ^  +  abort. 

Thus  from  the  definition  of  F  (Figure]^,  we  know  (C,  S,£)  G  F{S). 

3.  {C,a)  —^+  {C',a')  and  ETr{C\  a',  £): 

From  R,G,I  ^  (C,  a,  M)  (C,  S),  we  know  one  of  the  following  two  cases  holds: 

(a)  there  exist  M' ,  C  and  E'  such  that  (C,  E)  (C,  E')  and  i?,  G,  /  h  {C ,  .  M')  Fq  (C,  E'). 

Thus  (C'j  E',£)  G  S.  Then  from  the  definition  of  F  (Figure [^,  we  know  (C,  E,£)  G  F{S). 

(b)  there  exists  M'  such  that  M'  <  M  and  R,G,I  \=  (G',  a' ,  M')  Fq  (C,  E). 

Then  from  the  induction  hypothesis,  we  know  ETr{C,  E,£). 

4.  (G,cr)  -£^+{C',a'),  ETr{C',a\£')  and  £  =  e::£': 

From  R,G,I  ^  (G,  a,  M)  (C,  E),  we  know: 

there  exist  C',  E'  and  M'  such  that  (C,  E)  — ^  +  (C',E')  and  R,G,I  \=  {G' ,a' ,M')<q{<C ,'E'). 
Thus  (C'jE'jf')  G  S.  Then  from  the  definition  of  F  (Figurej^,  we  know  (CjEjf)  G  F{S). 

Then  we  are  done.  □ 
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5.2  Unary  Judgment  Semantics 

The  unary  judgment  semantics  R,G,I  ^  {p}C{q}  follows  RGSim-T  (Definition  [^ .  The  initial  abstract 
code  in  the  simulation  comes  from  the  precondition  p,  and  the  postcondition  q  specifies  the  final  abstract 
code  that  corresponds  to  the  concrete  final  code  skip.  The  assertions  p  and  q  also  specify  the  while- 
specific  metric  w  (the  numbers  of  tokens) ,  which  must  be  related  to  the  metric  M  used  in  the  simulation 
RGSim-T. 

Below  we  hrst  show  how  we  instantiate  the  abstract  metric  M  in  RGSim-T  based  on  w. 


5.2.1  Instantiation  of  the  Abstract  Metric  M 


For  each  single  thread,  its  metric  ws  (defined  below)  is  a  list  of  {w,n)  pairs,  where  w  is  the  while-specific 
metric  and  n  is  “code  size”  which  will  be  explained  later.  We  let  the  threaded  metric  ws  be  a  list  (a  stack 
actually)  to  allow  different  while-specific  metrics  for  nested  loops.  That  is,  when  entering  a  loop,  we  can 
push  a  {w,n)  pair  to  the  ws  stack;  and  when  exiting  the  loop,  we  pop  the  pair  out  of  ws. 

The  threaded  metric  ws  uses  the  dictionary  order.  However,  the  usual  dictionary  order  over  lists  is 
not  well-founded  (consider  B  >  AB  >  AAB  >  AAAB  >  ...  in  a  dictionary).  To  address  this  issue,  we 
introduce  a  bound  of  the  list  length  (stack  height),  and  define  the  well-founded  order  <u  by  requiring 
the  lists  should  be  not  longer  than  %.  Intuitively,  the  stack  height  %  represents  the  maximal  depth  of 
nested  loops,  so  it  can  be  determined  for  any  given  program. 

To  get  the  whole-program  metric,  we  compose  threaded  metrics  by  pairing  them.  Thus  the  abstract 
metric  M  in  RGSim-T  is  instantiated  as  follows: 


M  ::=  {ws,H)  \  (M,M) 

and  we  define  the  well-founded  oder  <  and  the  composition  operation  -|-  (see  Lemma  16 )  as  follows: 
ws'  <■}{  ws  B'  =  B  <  Ml  M'2  =  M2  M'^  =  Mi  M^  <  M2 


{ws',B')  <  {ws,B) 


(M{,M')  <  {Ml,  M2) 
M1+M2  =  [Ml,  M2) 


(M{,M')  <  {Ml,  M2) 


The  threaded  metric  ws  and  the  well-founded  order  are  defined  below.  Note  that  we  allow 
“A  <  AB  <  B”  in  a  dictionary. 

{WfStack)  ws  ::=  {w,n)  \  {w,n)::ws 
{StkHeight)  B  €  Nat 


ws'  <-H  ws  iff  {ws'  ws)  A  (|ws'|  <B)  A  (|ws|  <  B) 

{w',n')  <  {w,n)  {w',n')  <  {w,n)  {w',n')  <  {w,n) 

{w' ,n')  {w,n)  {w',n')  {w,n)::wsi  {w' ,n')\\ws'i  ^  {w,n) 

{w',n')  <  {w,n)  {w',n')  =  {w,n)  ws'i  ^  wsi 

{w',  n') ::  ws'i  <C  {w,  n) ::  wsi  {w' ,  n') ::  ws'i  <C  {w,  n) ::  wsi 

Here  |ws|  is  the  length  of  ws,  which  is  defined  as  follows: 

|(w,n)|  =  1 

|(w, n)::ws\  =  1  -f  | ws| 


The  well-founded  order  over  the  {w,n) 

pairs  is  a  usual  dictionary  order: 

{w',  n')  <  {w,  n) 

iff 

{w'  <  w)\/  {w'  =  w  An' 

II 

iff 

II 

> 

II 

{w',  n')  <  {w,  n) 

iff 

{w',  n')  <  {w,  n)  V  {w' ,  n! 
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□ 


Lemma  5  (Well-foundedness).  The  relation  M'  <  M  defined  above  is  a  well-founded  relation. 
Proof:  Easy  to  prove  from  Lemma 

Lemma  6.  The  relation  ws'  <-h  ws  defined  above  is  a  well-founded  relation. 

Proof:  Suppose  there  is  an  infinite  descending  chain: 


WSq  >  WSi  >  WS2  >  . . . 

(5.2) 

Thus  we  know 

WSo  »  WSi  »  WS2  »  .  .  . 

(5.3) 

and 

Vk.  wsfc  <  77 

(5.4) 

We  prove  the  following 

property  which  generalizes  (5.4)  over  the  maximum  size  77: 

Vwso, 

wsi,  WS2, ....  (Vfc.  wsk  >>  wsfe+i)  =>  (Vm  >  1.  3j.  |wsj|  >  m) 

(5.5) 

By  induction  over  m. 

•  Base  Case:  m  =  1.  Suppose  Vfc.  Iwsfcl  =  1-  Thus  we  have  an  infinite  descending  chain: 


{wo,no)  >  {wi,ni)  >  {w2,n2)  >  ■■■ 


(5.6) 


It  violates  the  definition  of  {w',n')  <  {w,n)  (which  is  a  well-founded  relation). 

•  Inductive  Step:  m  =  m'  +  1.  Since  {w',n')  <  (w,n)  is  a  well-founded  relation,  we  know  there 
must  exists  k  such  that 

Vj  >  k.  root(wsj)  =  root(wsj+i)  (5.7) 

and  there  exist  ws'f,,  ■  ■  •  such  that  Vj  >  k.  wsj  =  root{wSj) ::  ws'^  and 

Vj  >  k.  ws'j  >  'ws'j+i  (5.8) 

Here  root(M;s)  takes  the  first  element  of  ws  if  ws  has  the  first  element  and  undefined  otherwise. 

From  the  induction  hypothesis,  we  know  there  exists  j  >  k  such  that 

\wSj\>m'.  (5.9) 


Thus  Iwsjl  >  m'  -I-  1. 

So  we  are  done.  □ 

5.2.2  Intuitions  of  H  and  the  Second  Dimension  of  ws 

Below  we  give  more  informal  explanations  (and  examples)  about  the  stack  height  77  and  the  second 
dimension  (“code  size”  n  in  each  pair)  of  the  threaded  metric  ws. 

As  we  said,  the  stack  height  77  represents  the  maximal  depth  of  nested  loops.  For  any  given  program 
C,  we  can  determine  the  stack  height  using  a  function  height  dehned  in  Figure  |3^ 

The  threaded  metric  ws  as  a  stack  requires  us  to  distinguish  the  executions  of  the  loop  body  from  the 
executions  of  the  code  out  of  the  loop.  When  entering  a  loop  (for  the  first  time),  we  can  push  a  (w,n) 
pair  onto  the  ws  stack.  But  when  we  repeatedly  execute  the  loop  body  (not  for  the  first  time),  we  do  not 
want  to  push  a  new  pair  onto  the  stack. 

Thus  we  introduce  the  runtime  command  while  (B){C}  to  represent  the  while-loop  continuation  when 
we  have  unfolded  the  loop  while  (B)  C.  And  we  revised  the  low-level  operational  semantics  as  follows: 
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height(skip) 
height(c) 
height((C')) 
height(Ci;  (72) 
height(if  (B)  Ci  else  C2) 
height(while  (B)  C) 


1 

1 

1 


maa;{ height((7i),  height((72)} 
maa;{ height((7i),  height((72)} 
height((7)  +  1 


Figure  39:  Definition  of  height. 


[[BJs  =  true  |[i3]]s  =  false 

(while  (B)  C,(s,h))  — ((7;  while  {B){C},{s,h))  (while  (B)  C,{s,h))  — (skip,  (s,  ft)) 

|[B|s  =  true  |[i?|s  =  false 

(while  {B){C},{s,h))  — ((7;  while  {B){C},{s,h))  (while  {B){C},{s,h))  — ^  (skip,  (s,  h)) 

We  can  see  that  the  new  operational  semantics  for  while  loops  is  equivalent  to  the  original  one  (see 
Figure]^.  Below  we  will  assume  the  new  semantics  and  use  it  to  prove  the  logic  soundness.  However, 
we  want  the  readers  to  note  that  without  the  new  operational  semantics,  we  can  still  define  the  unary 
judgment  semantics  and  prove  the  soundness  of  all  the  inference  rules,  based  on  the  original  operational 
semantics.  The  new  operational  semantics  for  while  loops  just  makes  the  proofs  (and  the  intuition) 
clearer,  in  particular,  for  the  hide-w  rule,  the  rule  for  “locally”  reasoning  about  nested  while  loops. 

With  the  runtime  while  {B){C},  we  can  calculate  the  code  size  n  in  each  {w,n)  pair  of  ws.  We  first 
label  the  code  such  that  different  layers  of  a  nested  while  loop  are  assigned  different  labels. 


Labeling  the  Code  The  syntax  of  the  labeled  code  is  defined  below. 


straightforward,  as  shown  in  Figure  40 


Its  operational  semantics  is 


{Label)  I  G  Nat 

{LabStmt)  C  ::=  skip'  |  c'  |  ((7)'  |  Ci;C2  |  if'(H)  Ci  else  C2 
I  while'(H)  C  I  while'(B)  C 


We  label  the  low-level  code  in  the  following  way.  Note  that  we  do  not  need  to  label  the  runtime 
command  while  {B){C},  whose  label  is  known  during  the  runtime  execution. 


Iabeling(skip,  1) 
labeling(c,  1) 
labeling({(7),  1) 
labeling((7i;  (72, 0 
labeling(if  (B)  Ci  else  (72,0 
labeling(while  (B)  C,l) 


skip 

L 

{cy 

labeling((7i,  0;  labeling((72, 0 

ifOfJ)  labeling((7i,  0  else  labeling((72,  0 

whileOf?)  Iabeling((7,  Z -h  1) 


We  define  the  functions  label,  toplabel,  minlabel  and  maxiabel  in  Figure]^  Then  the  stack  height  H 
of  C  is  actually  the  maximum  label  of  C,  which  is  obtained  by  labeling  C  with  1.  That  is,  the  following 
holds: 


height((7)  =  maxlabel(labeling((7, 1)) 

We  can  prove  the  following  property. 

Lemma  7.  For  any  (7,  (7,  C",  a,  a'  and  R,  if  labeling(C',  1)  =  C  and  (C,  cr)  1-^*  ((7',  a'),  then  there 
exist  I,  Cl,  . . . ,  Cl  such  that  C'  =  {Ci; . . . ;  Ci)  and  Vf  S  [1..Z].  label(C0  =  i. 
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IBlIs  =  true 

(while' (B)  d,  {s,  h))  — >  (C;  while'  (B)  C,  (s,  h)) 
|[i3|s  =  true 

(while'(B)  d,  {s,h))  — >  ((7;while'(B)  0,  {s,h)) 

{d,a)^id',a') 

Figure  40:  Selected  operational 


JBIIs  =  false 

(while' (B)  C,  (s,  h))  — >•  (skip',  {s,  h)) 
|[B|s  =  false 

(while'(B)  C,{s,h))  — (skip',  (s, /i)) 

(skip';  C',cr)  — )■  {O',  a) 

((a,S),(a',S'),b) 

id,a)A{d,a') 


semantics  rules  of  the  labeled  language. 


label(skip') 

label(c') 

label((C')') 

label(Ci;C2) 

label  (if' (B)  Ci  else  C2) 
label(while'(B)  C) 
label(while'(B)  C) 


I 

I 

I 


I 

I 

I 


label(Ci) 

undefined 


if  label(Ci) 
otherwise 


label(C2) 


minlabel(skip') 
minlabel(c') 
minlabel(((7)') 
minlabel(C'i;  C2) 
minlabel(if'(B)  Ci  else  C2) 
minlabel(while'(B)  C) 
minlabel(while'(B)  C) 


I 

I 

I 

minlabel((72) 

I 

I 

I 


maxlabel(skip') 
maxlabel(c') 
maxlabel({(7)') 
maxlabel(C'i;  C2) 
maxlabel(if'(B)  Ci  else  C2) 
maxlabel(while'(B)  C) 


I 

I 

I 


mas{maxlabel(C'i),  maxlabel(C'2)} 
ma®{maxlabel(C'i),  maxlabel(C'2)} 
maxlabel(C') 


Figure  41:  Functions  on  labeled  code. 
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It  says,  at  any  time  in  the  execution  of  C,  the  runtime  code  must  be  in  the  form  of  Ci;  C/_i . . . ;  Ci,  where 
each  Ci  has  a  fixed  label  i. 


Code  Sizes  for  Labeled  Code  For  each  pair  {w,  n)  in  any  ws,  n  can  be  statically  determined  by  the 
code.  We  use  proj2(ws)  to  project  each  pair  (w,n)  in  ws  to  n.  proJ]^(ws)  is  defined  similarly. 

ns  ::=  n  \  n::ns 


proj2(w,n)  =  n 
proJ2((rc,  n) ::  ws)  =  n::proJ2(ws) 


We  use  |C]  to  compute  a  list  of  code  sizes  for  C.  Then 

proJ2(ws)  =  [C],  where  C  is  some  run-time  labeled  code  and  ws  is  the  metric  for  C. 
We  define  |C]  as  follows. 


Iskip'l 

lCi-,C2i 

|if'(B)  Cl  else  Ca] 
|[whiW(B)  C] 
|while'(B)  Cl 


=  0 
=  1 
=  1 

_  I  [CilSlCal©!  if  minlabel(Ci)  =  labellCa) 
y  IC2I ::  (|[Cil  ©  1)  if  minlabel(Ci)  >  labellCa) 
=  maallCil,  IC2I}  +  1 
=  1 
=  0::0 


Here  the  static  size  of  commands  \C\  is  defined  as  follows. 


I  skip' I 

|c'| 
KC)'| 
|Ci;C2| 
I  if' (5)  Cl  else  C2I 
I  while' (5)  C| 
|while'(H)  C| 

And  us  ©  n  is  defined  as  follows: 


0 

1 

1 

I  Cl  I  +  IC2I  +  1 
max{|Ci|,  IC2I}  +  1 


1 

0 


ns  ©  n 


def 


ni  +  n 

(ni  +  n) ::  ns' 

undefined 


if  ns  =  ni 
if  ns  =  ni ::  ns' 
otherwise 


Examples  of  ws  Below  we  use  a  few  simple  examples  to  show  how  ws  changes  during  an  execution. 
The  second  dimension  of  the  ws  for  the  runtime  labeled  code  C  coincides  with  the  above  definition  [C] . 


C 

a 

WS 

1 

while'^Ci  >  0)  i — 

i  =  2 

(0,1) 

2 

— >■ 

i — while^Ci  >  0)  i — 

i  =  2 

(0,0): 

:(1,2) 

3 

— >■ 

skip'^;  while'^Ci  >  0)  i — 

i  =  1 

(0,0): 

:(1,1) 

4 

-> 

while^Ci  >  0)  i — 

i  =  1 

(0,0): 

:(1,0) 

5 

— >■ 

i — while^Ci  >  0)  i — 

i  =  1 

(0,0): 

:(0,2) 

6 

skip^;  while'^Ci  >  0)  i — 

i  =  0 

(0,0): 

:(0,1) 

7 

while^Ci  >  0)  i — 

i  =  0 

(0,0): 

:(0,0) 

8 

skip^; 

i  =  0 

(0,0) 
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C 

a 

ws 

1 

i:=2^;  while^(i>0){  j:=l^;  while^  (j>0){  j — ^;};  i — } 

i  =  0,  j  =  0 

(0,3) 

2 

skip^;  while^(i>0){  j:=l^;  while^  (j>0){  j — ^;};  i — } 

i  =  2,j  =  0 

(0,2) 

3 

while^(i>0){  j:=l^;  while^  ( j>0) { j — ^;};  i — } 

i  =  2,j  =  0 

(0,1) 

4 

j:=l^;  while^  (j>0)  {j — ®;};  i — while^  (i>0)  {  .  .  .  } 

i  =  2,j  =  0 

(0,0): 

:(1,6) 

5 

skip^;  while^  (j>0)  {j — ®;};  i — while^  (i>0)  {  .  .  .  } 

i  =  2,j  =  1 

(0,0): 

:(1,5) 

6 

while^(j>0){j — ^;};  i — while^  (i>0)  {  .  .  .  } 

i  =  2,j  =  1 

(0,0): 

:(1,4) 

7 

— >■ 

j — while^(j>0){j — ^;};  i — while^  (i>0)  { .  .  . } 

i  =  2,j  =  1 

(0,0): 

:(1,3): 

:(0,2) 

8 

skip^;  while^  (j>0){j — ^;};  i — while^  (i>0)  {  .  .  . } 

i  =  2,j  =  0 

(0,0): 

:(1,3): 

:(0,1) 

9 

-> 

while^  (j>0){  j — ^;};  i — while^  (i>0)  {  .  .  . } 

i  =  2,j  =  0 

(0,0): 

:(1,3): 

:(0,0) 

10 

skip^;  i — while^  (i>0)  {  .  .  . } 

i  =  2,j  =  0 

(0,0): 

:(1,3) 

11 

i — while^  (i>0)  {  .  .  . } 

i  =  2,j  =  0 

(0,0): 

:(1,2) 

12 

skip^;  while^  (i>0)  {  .  .  . } 

i  =  1,  j  =  0 

(0,0): 

:(1,1) 

13 

while^(i>0){  while^  ( j>0)  {  j — ®;};  i — } 

i  =  1,  j  =  0 

(0,0): 

:(1,0) 

14 

while^  (j>0)  {j — ®;};  i — while^  (i>0)  {  .  .  .  } 

i  =  1,  j  =  0 

(0,0): 

:(0,6) 

15 

skip^;  while^  (  j>0)  {j — ®;};  i — while^  (i>0)  {  .  .  .  } 

i  =  1,  j  =  1 

(0,0): 

:(0,5) 

16 

-> 

while^  (j>0){j — ^;};  i — while^  (i>0)  {  .  .  .  } 

i  =  1,  j  =  1 

(0,0): 

:(0,4) 

17 

j — while^(j>0){j — ^;};  i — while^  (i>0)  { .  .  . } 

i  =  1,  j  =  1 

(0,0): 

:(0,3): 

:(0,2) 

18 

skip^;  while^  (j>0){j — ®;};  i — while^  (i>0)  {  .  .  . } 

i  =  1,  j  =  0 

(0,0): 

:(0,3): 

:(0,1) 

19 

while^  ( j>0)  {  j — ®;};  i — while'^  (i>0)  {  .  .  . } 

i  =  1,  j  =  0 

(0,0): 

:(0,3): 

:(0,0) 

20 

skip^;  i — while^  (i>0)  {  .  .  .  } 

i  =  1,  j  =  0 

(0,0): 

:(0,3) 

21 

i — while^  (i>0){  .  .  . } 

i  =  1,  j  =  0 

(0,0): 

:(0,2) 

22 

skip^;  while^  (i>0)  {  .  .  . } 

i  =  0,  j  =  0 

(0,0): 

:(0,1) 

23 

-> 

while^(i>0){  while^  ( j>0)  {  j — ^;};  i — } 

i  =  0,  j  =  0 

(0,0): 

:(0,0) 

24 

skip^ 

i  =  0,  j  =  0 

(0,0) 
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The  next  example  is  a  loop  that  uses  the  counter.  It  involves  environment  steps,  denoted  by  R,  and 
When  the  environment  updates  x  (see  line  7),  we  increase  the  number  of  tokens 


defined  in  Section  4.1 


by  1,  i.e.,  w  at  the  outermost  pair  of  the  stack  ws  is  increased  from  0  to  1. 


C 

a 

WS 

1 

while^Ci  >  0){ 
b :  =f  alse^; 

while^(!b){  t :  =x®;  b :  =cas  (&x,t  if  ^  (b)  i — }; 

} 

x  =  5 

i  =  1 

b  =  false 

t  =  0 

(0,1) 

2 

—I  b:=false^;  while^  (  !b)  {  .  .  .  };  while^Ci  >  0){...} 

(0,0)::  (0,4) 

3 

— >  skip^;  while^  (  !b){  .  .  .  };  while^Ci  >  0){...} 

(0,0)::  (0,3) 

4 

—I  while“  (  !b)  {  .  .  .  };  while^Ci  >  0){...} 

(0,0)::  (0,2) 

5 

t :  =x‘b  b :  =cas  (&x,t  ,t+l)  if (b)  i — 
while^  ( !  b)  {  .  .  . };  while^Ci  >  0){...} 

(0,0)::(0,1)::(0,  7) 

6 

skip^;  b :  =cas  (&x,t  if  ^  (b)  i — 

while^  ( !  b)  {  .  .  . };  while^Ci  >  0){...} 

X  =  5 

t  =  5 

(0,0)::  (0,1)::  (0,6) 

7 

R 

X  =  8, . . . 

(0,0)::(0,1)::(1,6) 

8 

—I*  while^  ( !  b)  { .  .  . };  while^Ci  >  0){...} 

X  =  8 

i  =  1 

b  =  false 

t  =  5 

(0,0)::(0,1)::(1,0) 

9 

t :  =x‘b  b :  =cas  (&x,  t  ,t+l)  if (b)  i — 
while^  ( !  b)  {  .  .  . };  while^Ci  >  0){...} 

(0,0)::  (0,1)::  (0,7) 

10 

—I*  while^  ( !  b)  {  .  .  . };  while^Ci  >  0){...} 

X  =  8 

i  =  0 

b  =  true 

t  =  8 

(0,0)::  (0,1)::  (0,0) 

11 

—I  skip^;  while^Ci  >  0){...} 

(0,0)::  (0,1) 

12 

—I  while^  (i  >  0)  {  .  .  . } 

(0,0)::  (0,0) 

13 

—I  skip^; 

(0,0) 

Note  that  in  this  section  we  assume  that  the  outer  loop  and  the  inner  loop  each  uses  a  “local”  while- 
specific  metric  w.  The  intuition  explained  here  actually  shows  how  we  prove  the  soundness  of  the  while-l 
rule.  For  the  while  rule,  we  use  a  “global”  while-specific  metric,  and  hence  the  depth  of  ws  could  be 
just  1  and  we  do  not  need  to  push  a  new  (ru,  n)  pair  whenever  entering  a  loop.  In  this  case,  the  second 
dimension  of  ws,  i.e.,  the  size  of  the  code,  will  count  in  the  runtime  while  command  while  {B){C}  too. 
We  show  a  simple  example  below,  where  the  stack  ws  is  always  of  depth  1. 


C 

CT 

WS 

1 

while^d  >  0)  i — 

i  = 

2 

(2,1) 

2 

i — while^Ci  >  0)  i — 

i  = 

2 

(1.3) 

3 

skip^;  while^d  >  0)  i — 

i  = 

1 

(1,2) 

4 

while^d  >  0)  i — 

i  = 

1 

(1.0) 

5 

-4- 

i — while^d  >  0)  i — 

i  = 

1 

(0,3) 

6 

skip^;  while'^d  >  0)  i — 

i  = 

0 

(0,2) 

7 

— >■ 

while^d  >  0)  i — 

i  = 

0 

(0,1) 

8 

-> 

skip^; 

i  = 

0 

(0.0) 
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5.2.3  Unary  Judgment  Semantics 
Definition  8.  R,G,I  ^  {p}C{q}  iff 

for  all  cr,  w,  D  and  E,  if  (cr,  w,  D,  E)  |=  p,  then  R,G,I  \=  {G,  ct,  (0,  IC'D)  ^height(C);u);g  (D,  E). 

Whenever  R,G,I  ^  (C,  cr,  tvs)  diH\w-q  (D,  E),  then  (cr,  E)  ^  *  true  and  the  following  are  true: 

1.  for  any  ap,  ^f,  C  and  a",  if  {G^a^up)  — ^  (C',cr")  and  E-LE/t-,  then  there  exists  a'  such  that 
a"  =  cr'  l±)  (Ti?  and  one  of  the  following  holds: 

(a)  either,  there  exist  ws' ,  w' ,  C'  and  E'  such  that  (D,  E  l±)  E^ )  — >•+  (C',  E'  l±)  Ej?), 

((cr,  E),  (cr',  E'),  true)  ^  G'+  *  True  and  R,  G,  I  |=  (C',  cr',  ws')  din-,w'-,q  (C',  E'); 

(b)  or,  there  exists  ws'  such  that  ws'  <u  ws, 

((cr,  E),  (cr',  E),  false)  |=  G+  *  True  and  R,G,I  \=  {G' ,  cr',  ws')  din\w-q  (D,  E); 

2.  for  any  ap,  T,f,  e,  G'  and  a" ,  if  (C,  crttluF)  — ^  (C',cr")  and  E_LEf,  then 

there  exist  cr',  ws' ,  w',  C'  and  E'  such  that  cr"  =  cr'  l±)  ap,  (D,  E  l±)  Up)  — (C',  E'  l±)  Ef), 

((cr,  E),  (cr',  E'),  true)  |=  G~^  *  True  and  R,G,I  \=  {G' ,  o' ,  ws')  din\w’-,q  (C',  E'); 

3.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  true)  \=  *  Id,  then 

there  exist  ws'  and  w'  such  that  R,  G,  I  \=  (C,  cr',  ws')  diH;w'\q  (D,  E'); 

4.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  false)  \=  *  Id,  then 

R,  G,I^{G,a',ws)<n-,w,q  (D,S'); 

5.  if  C  =  skip,  then  for  any  Ef,  if  E_LEf,  one  of  the  following  holds: 

(a)  either,  there  exist  w' ,  C'  and  E'  such  that  (D,  E  l±)  Ef)  — (C',  E'  l±)  Ef), 

((cr,  E),  (cr,  E'),  true)  \=  G+  *  True  and  (cr,  w' ,  C',  E')  (=  q; 

(b)  or,  there  exists  w'  such  that  ws  =  [w' ,  0)  and  (cr,  w  +  w' ,  D,  E)  \=  q; 

6.  for  any  ap  and  Ef,  if  (G,  a  l±)  ap)  — abort  and  E_LEf,  then  (D,  E  l±)  Ef)  — abort. 
Definition  9  (SL  Judgment  Semantics). 

|=SL  bJC'M  iff,  for  all  cr,  w,  D  and  E,  if  (cr,  w,  D,  E)  \=  p,  the  following  are  true: 

1.  for  any  cr',  if  (G,  ct)  — (skip,cr'),  then  (ct',w,D,  E)  ^  q; 

2.  (G,  cr)  *  abort; 

3.  (G,a)^--. 

|=SL  [U]C[Q]  iff,  for  any  a  and  E,  if  (cr,  E)  \=  P,  the  following  are  true: 

1.  for  any  E',  if  (C,  E)  — >*  {skip,  E'),  then  (ct,  E')  ^  Q; 

2.  (C,  E)  abort; 

3.  (C,E) 

Definition  10  (Locality). 

Locality(G)  iff,  for  any  cti  and  CT2,  let  ct  =  CTi  l±)  CT2,  then  the  following  hold: 

1.  (Safety  monotonicity)  If  (G,  cti)  ^A*  abort,  then  (G,  ct)  ^A*  abort. 

2.  (Termination  monotonicity)  If  (G,  cti)  —/^*  abort  and  (G,  cti)  ■,  then  (G,  ct)  ■. 

3.  (Frame  property)  For  any  n  and  ct',  if  (G,  CTi)  —/^*  abort  and  (G,  ct)  — (C,  cr'),  then  there  exists 
ct)  such  that  ct'  =  ct)  l±)  CT2  and  (G,  cti)  — (G',  ct)). 

Locality(C)  is  defined  similarly. 
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5.3  Soundness  of  Binary  Rules 

Lemma  11.  If  i?,  G,  I  h  {PjG^CIQ},  then  I  >  {R,  G},  P  \J  Q  ^  I  *  true  and  Sta({P,  Q},  R  *  Id). 


Proof:  By  induction  over  the  derivation  of  i?,  G,  /  h  {P}G^C{(5},  and  by  Lemma  27  For  the  stability, 
we  need  Lemmas  [T^  [TSl  and  [Ml  □ 


Lemma  12.  If  Sta(p  A  B,R*  Id),  Sta(p  A  R  *  Id)  and  p  ^  {B  =  B),  then  Sta(p,  R*  Id). 

Lemma  13.  If  Sta(p,  R  *  Id),  p  ^  {B  =  B)  *  I  and  1 1>  R,  then  Sta(p  A  B,R*  Id). 

Lemma  14.  If  Sta(pi,  i?i  *  Id),  Sta(p2,  R2  *  Id),  G  >  i?i,  I2  >  R2,  Pi  ^  Ii  *  true,  p2  ^  I2  *  true,  then 
Sta(pi  *  P2,Ri  *  i?2  *  Id). 


The  B-PAR  rule.  We  define  Mi  +  M2  as  a  pair  {Mi,  M2).  The  corresponding  well-founded  order 
satisfies  the  following: 

(Ml  <  M2)  ^  (Ml  +M3<M2  +  Ms)  (5.10) 

(Ml  <  M2)  ^  (M3  +  Mi<M3  +  M2)  (5.11) 

Lemma  15  (Parallel  Compositioinality).  If 

1.  i?V  G2,Gi,/  h  {Pi  *  P}Ci^Ci{Qi  *  Q'll; 

2.  i?  V  Gi,  G2,  /  ^  {P2  *  T’}C2  ^  C2{Q2  * 

3.  P  V  Q'lV  Q2  ^  p,  1 1>  {i?,Gi,  G2};  Sta((5i  *  Q[,  {R  V  G2)  *  Id);  Sta(Q2  *  Q21  (P  V  Gi)  *  Id); 
then  R,  Gi  V  G2, 1  |=  {Pi  *  P2  *  P}Gi  ||  G2  ACi  |||C2{Qi  *  Q2  *  (Qi  A  Q^)}. 

Proof:  We  need  to  prove:  for  all  a  and  E,  if  (cr,  E)  |=  Pi  *  P2  *  P,  then  there  exists  M  such  that 
R,  Gi  V  G2, 1  \=  (Gi  II  G2,  cr,  M)  :<Qi*q2*{q[/\Q'2)  (Ci  |||C2,  E). 

From  (cr,  E)  ^  Pi  *  P2  *  P,  we  know  there  exist  cri,  cr2,  tr^  Ei,  E2  and  E^  such  that 

(o'l,Ei)^Pl,  (cr2,E2)  1=  P2,  {(7r,^r)  \=  P,  CT  =  CTi  W  Cr2  W  CT,. ,  E  =  Ei  l±)  E2  W  E^ 

From  the  premises,  we  know  there  exist  Mi  and  M2  such  that 

R  V  G2,  Gi,  /  \=  (Gi,  CTi  l±)  (Tr,  Ml)  (Cl,  El  l±)  Er) 

R  V  Gi,  G2,  I  ^  (G2,  <72  W  CTr,  M2)  d:Q2*Q'2  (*^2,  E2  W  E^) 

By  Lemma [THl  we  are  done.  □ 

Lemma  16.  If 


1.  P  V  G2,  Gi,  /  ^  (Gi,  (Ti  l±)  CTr,  Ml)  (Cl,  El  i±)  Ej.); 

2.  P  V  Gi,  G2,  I  ^  (G2,  (72  W  <7r,  M2)  Pq2*Q'2  (*^2,  E2  l±)  Er); 

3.  (ct.,E,)  Q'lVQ^  J>{P,Gi,G2};  Sta(Qi  *  Q'l,  (P  V  G2)  *  Id);  Sta(Q2  *  Q2.  (-R  V  Gi)  *  Id); 


then  P,  Gi  V  G2, 1  |=  (Gi  ||  G2,  cri  l±)  0-2  l±)  ar.  Mi  +  M2)  Aqj*q2*(q'^aq^)  (Ci  IIIC2,  Ei  tt)  E2  W  E^.). 

Proof:  By  co-induction.  We  know  (cti  I±)  172  W  cr^,  Ei  l±)  E2  tt)  E,.)  \=  I  *  true. 

1.  for  any  Ep,  G'  and  a" ,  if  (Gi  ||  G2,  cri  tt)  0-2  W  cr,.  tt)  <7f)  — t  (G',  cr"),  then  one  of  the  following 
three  cases  holds: 
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(a)  C  =  C[  II  C2  and  (Ci,  cti  I±)  (72  W  W  ap)  — ^  {C[,a”): 
from  the  premise  1,  we  know:  there  exists  a'  such  that 

cr"  =  ct'  l±)  CT2  tt)  iTf 

and  one  of  the  following  holds: 
i.  there  exist  M[,  and  S'  such  that 

(Cl ,  Si  l±)  S2  W  S^  l±)  Si?)  — y  (C'l ,  s'  l±)  S2  l±)  Si?) 

((cti  l±)  CTr,  Si  l±)  Sr),  (cr',  S'),  true)  ^  Gi'^  *  True 
i?VG2,Gi,/h(C(,c7',M()^Q^,Q,  (C;,S') 
Below  we  prove  1(a)  of  Definition  holds . 


(5.12) 


(5.13) 

(5.14) 

(5.15) 


From  I  [>Gi,  (tTr,Sr)  |=  I  and  (5.141,  we  know:  there  exist  S(,  ct(,  and  SJ,  such  that 
ct' =  c?;  i±)  ct;  ,  s'  =  s'iws;,  ((T;,s;)h/  (5.i6) 

((ur,  Sr),  (a;,  s;),  true)  h  Gi+  (5.17) 


From  (5.12)  and  (5.16),  we  know 

a"  =  l±)  (72  tt)  l±)  (Ti? 

From  (5.13)  and  (|5.16|),  we  know 

(Cl  IIIC2,  Si  l±)  S2  l±)  Sr  l±)  Si?)  - > 


'l||jC2,S'i  WS2WS'rWSi?) 


(5.18) 

(5.19) 


From  (5.17),  we  know: 


((cTi  tt) (72  1+) CTr,  Si  tt)S2l±) Sr),  (cr(  l±) (72  Wcr),,  S'^  l±)S2tt)S(,),  true)  \=  (Gi  V  G2)'''  *True  (5.20) 


and  (((72  l±)  (7r,  S2  tt)  Sr),  ((72  W  a'^,  S2  l±)  Sr),  true)  ^  (Gi  V  i?)’*’  *  Id. 
Then  from  the  premise  2,  we  know:  there  exists  such  that 

R  V  Gi,  G2,  /  1=  (G2,  (72  l±)  (7r,  M2)  dlQ2*Q'2  (C2,  S2  l±)  Sr) 


(5.21) 


From  (5.15),  (5.16),  (5.21)  and  the  co-induction  hypothesis,  we  know: 


R,Gi  V  G2,/  \=  {G[  II  G2,(7'i  W  0-2  W  +  M^)  ^Qi*Q2*(Q'iAQ^)  (C'i|||C2,  S'^  tt)  S2  W  S^,) 

(5.22) 

From  (|5.18|),  (|5.19|),  (|5.20|)  and  (|5.22|),  we  are  done. 


ii.  there  exists  M{  such  that 

M[  <  Ml 

(((7i  tt)  (7r,  Si  tt)  Sr),  (ct'.  Si  tt)  Sr),  false)  \=  Gi'*'  *  True 
i?  V  G2,  Gi, /  1=  (G(,  (7',  M()  (Cl,  Si  W  Sr) 

Below  we  prove  1(b)  of  Definition  holds. 


(5.23) 

(5.24) 

(5.25) 


From  I  [>  Gi,  (ccr,  Sr)  |=  I  and  (5.24),  we  know:  there  exist  and  ct(,  such  that 

cr'  =  (7(  tt)  (7r  ,  (cr),,  Sr)  (5-26) 

(((7r,  Sr),  ((7),  Sr),  false)  ^^1+  (5.27) 


From  (5.12)  and  (5.26),  we  know 


cr"  =  cr)  tt)  (72  tt)  cr)  tt)  (7i? 


(5.28) 
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From  (5.27),  we  know: 


((critt)cr2Wo-j.,Eil±)E2l±)Sr),  ((Titt)cr2W<7^,Eil±)E2tt)Sj.),false)  ^  (Gi  V  G2)''’*True  (5.29) 


and  (((72  l±)  (Jr,  E2  tt)  Er),  ((72  W  a'^,  E2  l±)  Er),  false)  |=  (Gi  V  R)'^  *  Id. 
Then  from  the  premise  2,  we  know: 

i?  V  Gi,  G2,  /  1=  (G2,  (72  l±)  (7).,  M2)  dlQ2*Q'^  (C2,  E2  tt)  Er) 


(5.30) 


From  (5.25),  (5.26),  (5.30)  and  the  co-induction  hypothesis,  we  know: 


R,  Gi  V  G2,  /  1=  (G(  II  G2,  cr'i  tt)  CT2  tt)  cr).,  M(  -I-  M2)  ^Qi*Q2*(QiAQ^)  (Cl  IIIC2,  El  tt)  E2  tt)  Ej.) 

(5.31) 


From  (5.23),  we  get: 


M[+M2<Mi+  M2 


Since  Q'^  ^  I  and  (5.36),  we  get: 

((7i,E'i)  hQl,  {(Jr,K)'^Q'l 


From  (5.34)  and  (5.37),  we  know 

(Cl  ||jC2,  El  tt)  E2  tt)  Er  tt)  "Sp) 


{skip  IIIC2,  E^  tt)  E2  tt)  E).  tt)  E^:’) 


(5.32) 


From  (5.28),  (5.29),  (5.31)  and  (5.32),  we  are  done. 

(b)  G'  =  Gi  II  G2  and  (G2,  (7i  tt)  (72  tt)  cr,.  tt)  ap)  — t  (G2,  cr"):  similar  to  the  first  case. 

(c)  C  =  skip,  Gi  =  skip  and  G2  =  skip,  thus  we  know 

cr"  =  (7i  tt)  (72  tt)  (7^  tt)  (7^ 

Below  we  prove  1(a)  of  Definition  holds. 

From  the  premise  1,  we  know  one  of  the  following  holds: 
i.  there  exists  E'  such  that 

(Cl,  El  tt)  E2  tt)  E^  tt)  E^:^)  — t  {skip,  E^  tt)  E2  tt)  Ep^) 

(((7i  tt)  (7j.,  El  tt)  Er),  ((7i  tt)  (7r,  E'),  true)  1=  Gi'*'  *  True 
(cti  tt)  (7r,E')  \=Qi*  Q'l 

From  I  \>  Gi,  (cr^,  E^.)  |=  I  and  (5.35|),  we  know:  there  exist  E'^  and  E(,  such  that 

E'  =  E;tt)E;,  {ar,K)^I 
{{ar,  Er),  {ar,  E),),  true)  ^  Gi  + 


(5.33) 


(5.34) 

(5.35) 

(5.36) 

(5.37) 

(5.38) 

(5.39) 

(5.40) 


From  (5.38),  we  know:  (((72  tt)  cr^,  E2  tt)  E^),  {(J2  tt)  (Jr,  E2  tt)  E(.),true)  ^  (Gi  V  R)  *  Id. 
Then  from  the  premise  2,  we  know:  there  exists  M^  such  that 

i?  V  Gi,  G2,  /  1=  (G2,  (72  W  ar,  M')  (C2,  E2  W  E;)  (5.41) 

Since  G2  =  skip,  we  know  one  of  the  following  holds: 
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A.  there  exists  S"  such  that 


(C2,  W  S2  W  l±)  — y  (sA;ip,  l±)  l±i  Sp) 

(((72  W  (Tr,  S2  w  s'^),  (ct2  W  CTr,  S")>t>^ue)  |=  G2~^  *  True 

((72  W  O’r,  S")  ^  Q2  *  Q2 


(5.42) 

(5.43) 

(5.44) 


From  J>  G'2,  (iTr,  T;),)  ^  /  and  (5.43),  we  know:  there  exist  E2  and  E"  such  that 

(5.45) 

(5.46) 


S"  =  E'WE",  (a„S")h^ 
((ct^.E),),  (crr,S"),true) 


(5.47) 

(5.48) 


Since  Q2  ^  (5.44),  we  get: 

(a2,E')|=Q2,  (7„E")hQ2 

From  (5.40|)  and  (5.42),  we  know 

(Cl  III C2 ,  El  l±)  E2  tt)  Er  W  Ep’)  — >■  [skip,  E^  l±)  E2  tt)  E(!  I±)  E^) 

From  (5.38|)  and  (5.46),  we  know: 

((a„  E,),  (a„  E"),  true)  h  (Gi  V  €2)^  (5.49) 

Thus  we  get: 

((critt)cr2l±)crr,  Eitt)E2l±)Er),  ((Til±lcr2Wo-r,  Ejl±)E2l±)E"),true)  \=  (Gi  V  G2)'''*True  (5.50) 


From  (5.46),  we  get:  (((7^,  S).),  (cTj.,  E"),  true)  |=  {R\/G2)'^-  Since  (t7i,E()  ^  Qi, 
{ar,  E(.)  1=  Q[,  Sta((3i  *  Q[,  {R  V  G2)  *  Id),  /  >  (i?  V  G2)  and  Q[  ^  I,  we  know: 


(a„E")  hQ'i 


From  (cti,  E^)  ^  Qi  and  (5.47),  we  get: 


(5.51) 

(5.52) 


(cti  l±)  (72  tt)  ar,  E)  tt)  E2  tt)  E")  1=  Ql  *  Q2  *  (Ql  A  Q2) 

By  the  B-SKIP  and  B-frame  rules,  we  get:  there  exists  M'  such  that 

R,  Gi  V  G2,  /  h  (skip,  (7i  W  (72  tt)  ar,  M')  A(5i»Q2*(Q'iAQi)  [skip,  E^  tt)  E2  tt)  E")  (5.53) 

From  (5.48),  (5.50)  and  (|5.53l,  we  are  done. 

B.  C2  =  skip  and  ((72  tt)  cr^,  E2  tt)  E),)  \=  Q2* 

From  Q'^  ^  I  and  {ar,  E(.)  \=  I,  we  know: 

((72,E2)hQ2,  {ar,K)hQ2  (5.54) 


From  (5.40),  we  know 

(Ci|||C2,Ei  tt)  E2  tt)  Er  tt)  Eic) 


{skip,  E(  tt)  E2  tt)  E),  tt)  Ep.) 


From  (5.38),  we  know: 


(((7,.,Er),  ((7r,E(,),true)  |=  (Gi  V  G2)’ 


(5.55) 


(5.56) 


Thus  we  get: 

(((7itt)(72tt)(7j.,  Eitt)E2tt)Er),  ((7itt)(72tt)(7r,  E(tt)E2tt)E(,),  true)  ^  (Gi  V  G2)^*True  (5.57) 
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From  (5.391  and  (5.54),  we  get: 


(cti  I±)  (72  W  (Jr,  'S'l  l±)  S2  tt)  1=  Qi  *  Q2  *  {Qi  A  Q2)  (5.58) 

By  the  b-SKIP  and  b-frame  rules,  we  get:  there  exists  M'  such  that 

R,  Gi  V  G2, 1  1=  (skip,  (Ti  W  0-2  w  Jr,  M')  {skip,  S'l  l±)  E2  W  E(.)  (5.59) 

From  (5.55|),  (5.57)  and  (|5.59|),  we  are  done, 
ii.  Cl  =  skip  and  (cji  l±)  ar.  Si  l±)  S^.)  |=  Qi  *  Qi. 

From  Qi  =>  I  and  {(Jr,  S^)  ^  I,  we  know: 

(ui,Ei)hQi,  (a,,EQ|=Qi  (5.60) 

From  the  premise  2,  we  know  one  of  the  following  holds: 

A.  there  exists  S'  such  that 


(C2 ,  Si  1±)  S2  l±J  Sj.  1±)  SjT’)  — ^  {skip.  S'  1±)  Si  1±) 

((ct2  W  ar,  S2  l±)  Sj.),  ((72  W  Jr,  S'),  true)  ^  G2^  *  True 
(cr2  tt)  ar.  S')  ^  Q2  *  Qi 


(5.61) 

(5.62) 

(5.63) 


From  I  >  G2,  {ar,  S^)  ^  I  and  (5.62),  we  know:  there  exist  S^  and  S(,  such  that 

S'  =  S'2l±)S;,  {ar,K)hI  (5-64) 

((cr,.,  Sr),  (cr^.  Si),  true)  \=  G2~''  (5.65) 


Since  Qi  /  and  (5.63),  we  get: 

{j2,  S'2)  ^  Q2  ,  {ar,  S'^)  1=  Qi 


From  (5.61),  we  know 

(Ci|||C2,Si  l±)  S2  tt)  Sr  tt)  Yip) 


(5.66) 

(sfcip.  Si  tt)  si  tt)  S).  tt)  Si;’)  (5.67) 


From  (5.65),  we  know: 


((ar,Sr),(ar,S;),true)  |=  (Gi  V  62)’ 


(5.68) 


Thus  we  get: 

((critt)cr2ttcrr,  Sitt)S2tt)Sr),  (critt)(T2tt)(Tr,  Sitt)Sitt)Si),  true)  ^  (Gi  V  G2)^*True  (5.69) 


From  (5.65),  we  get:  ((cr,  Sr),  (cr,  S).),  true)  |=  {R'VG2)~^-  Since  (cti,Si)  \=  Qi, 
{ar,  Sr)  1=  Qi,  Sta(Qi  *  Qi,  {R  V  G2)  *  Id),  /  >  (i?  V  G2)  and  Qi  ^  I,  we  know: 


(ar,Si)  hQi 


From  ((Ti,Si)  ^  Qi  and  (5.66),  we  get: 


(5.70) 


(5.71) 


((Ti  tt)  (72  tt)  ar.  Si  tt)  Si  tt)  Si)  1=  Qi  *  Q2  *  (Qi  A  Qi) 

By  the  b-SKIP  and  B-frame  rules,  we  get:  there  exists  M'  such  that 
i?,  Gi  V  G2,  /  1=  (skip,  (71  tt)  (72  tt)  ar,  M')  ^q,*Q2*{Q[a.q'„)  {skip,  Si  tt)  Si  tt)  Si)  (5.72) 
From  (5.67),  (5.69)  and  (|5.72 1,  we  are  done. 
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B.  C2  =  skip  and  (ct2  W  ar,  S2  W  S,.)  \=  Q2  *  Q'^- 
From  Q'^  /  and  (<7^,  ^  /,  we  know: 


(5.73) 

+  (sfcip,Eil±)E2l±)Srl±)SF)  (5.74) 


(tT2,  S2)  h  <32  ,  (Cr,  Sr)  1=  (32 

We  know 

(Ci|||C2,Si  WS2  WEr  WS^) 

Also  we  have: 

((cril±)cr2l±)crr,  Sil±)E2tt)Sr),  ((Til±)(T2Wcrr,  Eil±)S2tt)Er),  true)  \=  (t?!  V  t?2)^*True  (5.75) 
From  (5.60|)  and  (5.73),  we  get: 

(cti  I±)  (72  W  (7r,  El  l±)  E2  l±)  Er)  |=  Ql  *  <32  *  (Ql  A  (32)  (5.76) 

By  the  b-SKIP  and  b-frame  rules,  we  get:  there  exists  M'  such  that 
R,Gi\/  G2,I  h  (skip,  (Ti  l±)  0-2  W  Cr,  M')  ^q,*Q2*{Q[aQ'^)  {skip,  Ei  l±)  E2  W  E^)  (5.77) 
From  (5.74),  (5.75)  and  (|5.77l,  we  are  done. 

2.  for  any  up,  e,  G'  and  a”,  if  {Gi  ||  (^2 ,  cti  l±)  0-2  W  W  ap)  — ^  (C',  cr"))  the  proof  is  similar  to  the 
first  case. 

3.  for  any  a'  and  E',  if  ((cti  tt)  0-2  W  Cr,  Ei  l±)  E2  W  E^),  {a',  E'),  true)  \=  i?+  *  Id, 
from  I  >  R  and  {ar,  E^)  \=  I,  we  know:  there  exist  cr(.  and  E).  such  that 

a' =  CTI  tt)  (72  tt)  a;  ,  E' =  El  l±)  E2  tt)  E;  ,  (a;,E;)^/  (5.78) 

(((7r,Er),(a;,E;),true)  |=  i?+  (5.79) 

Thus  we  get: 

((c7i  l±)  (7r,  El  l±)  Er),  ((7i  l±)  cr).,  Ei  l±)  E).),  true)  1=  {R  V  G2)'*'  *  Id  (5.80) 

(((72  l±)  (7r,  E2  l±)  Er),  ((72  W  cr).,  E2  W  E).),  true)  1=  {R  V  Gi)^  *  Id  (5.81) 

From  the  premises,  we  know:  there  exist  M{  and  such  that 

i?  V  G2,  Gi,  /  h  (Gi,  ai  W  (7;,  M[)  (Cl,  Ei  W  E^)  (5.82) 

R  V  Gi,  G2, 1  h  (C2,  ^2  W  cr'r,  M')  (C2,  E2  W  E^)  (5.83) 

By  the  co-induction  hypothesis,  we  get: 

R,  Gi  V  G2,  /  h  (<^1 II  G2,  (71  W  (72  W  a'r,  M[  +  M')  )  (Ci  IIIC2,  Ei  W  E2  W  E;)  (5.84) 


4.  for  any  a'  and  E',  if  ((cri  l±)  CT2  W  ar,  Ei  l±)  E2  W  E^),  {a',  E'),  false)  ^  *  Id, 

from  I  \>  R  and  (cr^,  E^)  \=  I,  we  know:  there  exist  cr).  and  E).  such  that 

cr'  =  (7 1  l±)  (72  l±)  cr)  ,  E'  =  El  tt)  E2  l±)  E).  ,  {a'r,'Er)  \=  ^  (5.85) 

(((7r,Er),((7;,E'J,false)  ^R+  (5.86) 

Thus  we  get: 

{{ai  tt)  ar,  El  tt)  Er),  (cri  tt)  cr),  Ei  tt)  E)),  false)  \=  {R\/  G2)’'’  *  Id  (5.87) 

(((72  tt)  ar,  E2  tt)  Er),  ((72  tt)  cr),  E2  tt)  E)),  false)  ^  (i?  V  Gi)’*’  *  Id  (5.88) 
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From  the  premises,  we  know: 


(5.89) 

(5.90) 


R  V  G2,  Gi,  /  h  (Gi,  ai  W  a;,  Mi)  (Ci,  Ei  W  S',) 

i?  V  Gi,  G2,  /  h  {C2,<T2  W  fr;,  M2)  (C2,  E2  W  S',) 

By  the  co-induction  hypothesis,  we  get: 

i?,  Gi  V  G2, 1  1=  (Gi  II  G2,  CTi  l±)  (72  l±)  cr' ,  Ml  -I-  M2)  ^Qi*Q2*(QiAQ^)  (^-i  IIIC2,  Si  l±)  S2  W  S(,)  (5.91) 

5.  for  any  ap  and  S^,  if  (Gi  ||  G2,  (Ji  l±)  (72  l±)  W  (Tf)  — s-  abort,  by  the  operational  semantics  and  the 
premises,  we  know  (Ci  |||C2,  Si  l±)  S2  l±)  S,  l±)  Sf)  — s-  ^  abort. 

Thus  we  are  done.  □ 
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The  U2B  rule. 


Lemma  17  (U2B).  If  R,G,I  ^  {P  A  arem(C)}C'{Q  A  arem(sfeip)},  then  R,  G,  I  |=  {P}C^C{Q}. 

Proof:  We  need  to  prove:  for  all  a  and  E,  if  (c,  E)  |=  P,  then  there  exists  M  such  that  R,G,I  \= 
(C,a,M)AQ(C,E). 

From  (tr,  E)  \=  P,  we  know:  (a,  0,  C,  E)  ^  P  A  arem(C). 

From  the  premise,  we  know:  R,  G,  I  \=  {G,  a,  (0,  IC'D)  ^height(C);0;QAarem(sfcip)  (C,  E). 

By  Lemma  [l8l  we  are  done.  □ 

Lemma  18.  If  R,  G,  I  |=  (C,  a,  ws)  di-H-,w-QAsr^ru(skip)  (C,  E),  then  R,G,I\=  (C,  cr,  (ws,  U))  diQ  (C,  E). 

Proof:  By  co-induction.  From  the  premise,  we  know  (cr,  E)  |=  /  *  true. 

1.  for  any  ap,  Yip,  G'  and  cr",  if  (C,  cr  l±)  ctf)  — {C',a")  and  E_LEia, 

from  the  premise,  we  know:  there  exists  a'  such  that  <j"  =  a'  ^  ap  and  one  of  the  following  holds: 

(a)  there  exist  ws',  w\  C'  and  E'  such  that  (C,  E  l±)  Ep)  — >•+  (C',  E'  l±)  Ep), 

((cr,E),  ((T',E'),true)  ^  0+  *  True  and  R,G,I  |=  (G',  a',  ws')  ^-H;n,';QABrem(skip)  (C',E'). 

By  the  co-induction  hypothesis,  we  know:  i?,  G,  /  |=  (G',  ct',  (ws',  'H))  Aq  (C',  E'). 

(b)  there  exists  ws'  such  that  ws'  Ku  ws, 

((cr,  E),  (cr',  E),  false)  |=  G+  *  True  and  R,G,I'^  (G' ,  a',  ws')  ^«;u,;QAarem(afeip)  (C,  E). 

By  the  co-induction  hypothesis,  we  know:  R,G,I  \=  (C ,  cr',  (ws' ,  %))  Aq  (C,  E). 

By  the  instantiation  of  the  abstract  metric,  we  know:  (ws' ,%)  <  (ws,!^). 

2.  for  any  ap,  Ep,  e,  C  and  cr",  if  (G,  cr  l±)  ap)  — ^  (G',  cr"),  the  proof  is  similar  to  the  previous  case. 

3.  for  any  cr'  and  E',  if  ((cr,  E),  (ct',  E'),  true)  ^  *  Id, 

from  the  premise,  we  know:  there  exist  ws'  and  w'  such  that 
G,  I  ^  (G,  cr  ,  ws  )  ^^-w' ;QAarem(skip)  (^;  ^  )■ 

By  the  co-induction  hypothesis,  we  know:  R,G,I  |=  (G,  cr',  (ws' ,%))  <q  (C,  E'). 

4.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  false)  \=  *  Id, 

from  the  premise,  we  know:  i?,  G,  /  ^  (G,  ct',  ws)  <H-,w-QAarem(skip)  (C,  E'). 

By  the  co-induction  hypothesis,  we  know:  R,G,I  \=  (G,  cr',  (ws,  R))  Aq  (C,  E'). 

5.  if  G  =  skip,  then  for  any  Ep,  from  the  premise,  we  know  one  of  the  following  holds: 

(a)  there  exist  w',  C'  and  E'  such  that  (C,  E  l±)  Ep)  — (C',  E'  l±)  Ep), 

((cr,  E),  (cr,  E'),  true)  ^  G’*'  *  True  and  (ct,  w' ,  C,  E')  \=  Q  f\  arem(sfcip). 

Thus  we  know  C'  =  skip  and  (ct,  E')  |=  Q. 

(b)  there  exists  w'  such  that  ws  =  (w' ,  0)  and  (cr,  w  +  w'  ,<C,E)  \=  Q  A  arevn(skip). 

Thus  we  know  C  =  skip  and  (ct,  E)  \=  Q. 

6.  for  any  ap  and  Ep,  if  (G,  ct  I±)  ap)  — ^  abort,  from  the  premise,  we  know  (C,  E  l±)  E^)  — >  ^  abort. 

Thus  we  are  done.  □ 
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The  TRANS  rule.  We  define  M2  o  Mi  as  a  pair  (M2,  Mi)  and  the  corresponding  well-founded  order 
as  the  lexical  order.  That  is,  the  following  hold: 


(M2  <  M^)  ^  (M2  o  Ml  <  M^  o  M’l)  (5.92) 

(Ml  <  M[)  {M2  o  Ml  <  M2  o  M()  (5.93) 

Lemma  19  (TRANS).  If 

1.  Ri,Gi,/ih{Pi}C^CM{Qi}; 

2.  R2,G2,/2h{P2}CM^C{Q2}; 

3.  MPrecise(/i,  12);  Ii  >  {i?i,  Gi};  I2  i>  {R2,  G2}; 

4.  ((Gi)'^?(G2)"=)  (Gi?G2)'^'"^  (Ri?R2)"^'"^  =>  {{Ri)^^I{R2)% 

then  (i?i  5  R2),  (Gi  ?  G2),  (G  ?  12)  1“  {^i  9  L2}G^C{Qi  5  (521- 


Proof:  For  all  a  and  S,  if  (cr,  E)  \=  Pi  ^P2,  we  know  there  exists  9  such  that  {a,  9)  ^  Pi  and  {9,  E)  P2- 
From  the  premise,  we  know: 

1.  there  exists  Ml  such  that  i?i, Gi, /i  ^  (G, ct,  Mi) (Cm, ^)- 

2.  there  exists  M2  such  that  R2,G2,/2  \=  {Cm,  9,  M2  )<Q.  (C,E). 


By  Lemma  20  we  know  {Ri  5  R2),  (Gi  ?  G2),  (Ii  ?  I2)  |=  (G,  cr,  (M2  o  Mi))  ^g^-gj  (C,  E).  Thus  we  are 
done.  □ 


Lemma  20.  If 

1.  i?i,  Gi, /i  ^  (G,  cr.  Ml)  ^gj  (Cm,  0); 

2.  i?2,  G2, /2  H  (Cm,  M2)  ^g^  (C,  E); 

3.  MPrecise(/i,  12);  Ii  >  {i?i,  Gi};  I2  o  {R2,  G2}; 

4.  ((Gi)+?(G2)+)  (Gi?G2)+;  (i?i?R2)+  ^  ((Ri)+ ? (i?2)+); 

then  {Ri  \  R2),  (Gi  ?  G2),  {h  ?  12)  h  {C,  a,  (M2  o  Mi))  ^g,.g,  (C,  E). 

Proof:  By  co-induction.  By  the  premises,  we  know  (cr,  9)  \=  Ii  *  true  and  {9,  E)  ^  /2  *  true.  Since 
MPrecise(/i,  12),  we  know  (cr,  E)  \=  (G  ^  I2)  *  true. 

I.  for  any  ap,  ^f,  G'  and  cr",  if  (G,  ct  I±)  ap)  — {C ,a"),  then  by  the  premise  I,  we  know: 
there  exists  cr'  such  that  cr"  =  a'  a p  and  for  any  one  of  the  following  holds: 

(a)  either,  there  exist  M(,  Cm  and  9'  such  that  (Cm,  9 1±)  9f)  — >•+  (Cm,  9'  l±)  9f), 

((a,  0),  (a',  0'),  true)  h  (Gi)+  *  True  and  Ri,  Gi,  G  h  (G',  a',  M( )  ^g,  (C'^,  9'). 

By  the  premise  2  and  Lemma  we  know:  one  of  the  following  holds: 

i.  either,  there  exist  M^,  C'  and  E'  such  that  (C,  E  l±)  Ep)  — (C',  E'  tt)  E^r), 
((0,E),(0',E'),true)  h  (Ga)^  *  True  and  R2,G2,l2  \=  (C'm,  0',  M')  ^g,  (C,  E'). 

Thus  we  know 

((CT,E),(a',E'),true)  |=  ((Gi)+ *  True)  ”  ((G2)+ *  True)  (5.94) 

Since  Ii  \>  Gi  and  I2  [>  G2,  we  know  Ii  >  (Gi)^  and  I2  o  (G2)''’.  Since  MPrecise(/i,  J2),  by 
Lemma  we  know 

((Gi)+ *  True)?  ((G2)+ *  True)  ((Gi)+ ?  (G2)+)  *  True  (5.95) 
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Thus  we  get: 


(5.96) 


((cr,  S),  (cr',  S'),  true)  )=  {GiiG2)  *  True 
Besides,  by  the  co-induction  hypothesis,  we  get: 

(i?i  ?  R2),  (Gi  ?  G2),  (Ji  ?  h)  h  (C^',  m  o  M[))  (C',  s')  (5.97) 

ii.  or,  there  exists  such  that  <  M2, 

{{0,  S),  (0',  S),  false)  h  (G2)+  *  True  and  R2,  G2,  h  h  (Cm,  O',  M')  (C,  S). 

Thus  we  know 

{{a,  S),  {a',  S),  false)  ^  ((Gi)+  *  True)  ”  ((G2)+  *  True)  (5.98) 

Thus  we  get: 

((cr,  S),  (cr',  S),  false)  |=  (Gi  ?  G2)'''  *  True  (5.99) 

Besides,  by  the  co-induction  hypothesis,  we  get: 

(i?i  ?  R2),  (Gi  ?  G2),  (/i  ?  I2)  h  iC',  a',  (M'  o  Mi))  (C,  S)  (5.100) 

Moreover,  we  know 

{Mi  o  Mi)  <  {M2  o  Ml)  (5.101) 

(b)  or,  there  exists  such  that  <  Mi, 

((a,  0),  (a',  0),  false)  h  (Gi)+ *  True  and  i?i,Gi,/i  |=  (G',  a',  M{)  (Cm,0). 

Since  {6,  S)  ^  /2  *  true,  we  know  {{6,  S),  (0,  S),  false)  ^  (G2)~''  *  True.  Thus 

{{a,  S),  {a',  S),  false)  |=  ((Gi)+  *  True)  °  ((G2)+  *  True)  (5.102) 

Thus  we  get: 

((cr,  S),  (cr',  S),  false)  |=  (Gi  ?  G2)^  *  True  (5.103) 

Besides,  by  the  co-induction  hypothesis,  we  get: 

{Ri  ?  i?2),  (Gi  ?  G2),  (Ji  ?  I2)  1=  (G',  a',  (M2  o  M'l))  (C,  S)  (5.104) 

Moreover,  we  know 

(M2  o  M()  <  (M2  o  Ml)  (5.105) 


2.  for  any  crp,  Sp’,  e,  G'  and  cr",  if  (G,  cr  l±)  ap)  {C ,  cr"),  then  by  the  premise  1,  we  know:  for  any 
9f,  there  exist  cr',  M(,  Cm  and  0'  such  that  cr"  =  cr'  l±)  ctf,  (Cm,  0  W  0ir)  — (Cm,  0'  l±)  0f), 

((a,  0),  (a',  0'),  true)  h  (Gi)+  *True  and  i?i,Gi,/i  h  {G' ,  a' ,  Mi)  (C'm,0'). 

By  the  premise  2  and  Lemma  we  know: 

there  exist  M^,  C'  and  E'  such  that  (C,  E  l±)  Ej’)  —^-1-  y;'  y  ’Sp), 

((0,E),(0',E'),true)  h  (Ga)^  *  True  and  i?2,G2,/2  |=  (C'm,  0',  M')  (C,  E'). 

Thus  we  know 

{{a,  E),  (a',  E'),  true)  h  ((Gi)+  *  True)  ?  ((G2)+  *  True) 

Thus  we  get: 

((cr,  E),  {a',  E'),  true)  ^  (Gi  ?  G2)"'^  *  True 
Besides,  by  the  co-induction  hypothesis,  we  get: 

{Ri  ?  i?2),  (Gi  ?  G2),  (/i  ?  I2)  h  (C",  n',  (M'  o  M'l))  (C',  E') 


(5.106) 

(5.107) 

(5.108) 
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(5.109) 


3.  for  any  a'  and  S',  if  {(a,  S),  (ct',  S'),  true)  |=  (i?i  5  R2)^  *  Id,  then  we  know 

((a,  S),  (a'.  S'),  true)  h  ((i?i)+  ?  (i?2)+)  *  Id 

By  Lemma  [26}  we  know 

((i?i)+?(i?2)+)*ld  ^  ((i?i)+*ld)|((i?2)+*ld) 

Thus  we  get:  there  exist  0,  O' ,  bi  and  &2  such  that  6  =  61  V  62, 

((a,0),(a',0'),6i)  h  (^i)^*ld  and  ((0,  S),  (0',  S'),  62)  h  (^2)+ *  Id 
From  the  premises,  we  know:  there  exist  and  M2  such  that 

(a)  i?i,  Gi,  Ii  \=  (G,  cr',  M[)^q^  (Cm,  O'); 

(b)  i?2,  G2, 12  h  (Cm,  0',  M')  (C,  S'). 

By  the  co-induction  hypothesis,  we  get: 


(i?i  ?  R2),  (Gi  ?  G2),  (/i  ?  /2)  h  (C,  a',  (M'  o  MO)  (C,  S') 

4.  for  any  a'  and  S',  if  ((cr,  S),  (cr'.  S'),  false)  0  {Ri  9-^2)"''  *  Id,  then  we  know 

((a,  S),  (a'.  S'),  false)  h  {(Ri)^  *  Id)  ?  ((i?2)+  *  Id) 

Thus  we  get:  there  exist  0  and  O'  such  that 

((ct,  0),  {a',  O'),  false)  |=  (i?i)+  *  Id  and  ((0,  S),  {O',  S'),  false)  0  (^2)'*'  *  Id 
From  the  premises,  we  know: 

(a)  i?i,  Gi,  Ji  0  (C,  cr'.  Ml)  (Cm,  O'); 

(b)  R2,G2,l2  1=  (Cm,0',M2)^Q,(C,S'). 

By  the  co-induction  hypothesis,  we  get: 

(i?i  ? R2),  (Gi  ?  G2),  (/i  ?  I2)  h  (C,  a',  (M2  o  Ml))  (C,  S') 

5.  if  G  =  skip,  then  by  the  premise  1,  we  know:  for  any  Op,  one  of  the  following  holds: 

(a)  either,  there  exists  O'  such  that  (Cm,  0  W  0f)  — (skip.  O'  tt)  Op), 

((cr,  0),  (cr.  O'),  true)  |=  (Gi)'*’  *  True  and  (cr.  O')  0  Qi- 

By  the  premise  2  and  Lemma  we  know:  for  any  Sj?,  one  of  the  following  holds: 

i.  there  exists  S'  such  that  (C,  S  tt)  S^?)  — (skip.  S'  l±)  'Sp), 

((0,  S),  {O',  S'),  true)  h  (^2)+  *  True  and  {O',  S')  h  O2. 

Thus  we  know 

((a,  S),  (a.  S'),  true)  h  ((Gi)+ *  True)  ?  ((G2)+ *  True) 

Thus  we  get: 

((cr,  S),  (cr.  S'),  true)  |=  (Gi  ?  G2)^  *  True 

Besides,  we  get: 

{a.  S')  1=  (Qi  ?  Q2) 


(5.110) 

(5.111) 

(5.112) 

(5.113) 

(5.114) 

(5.115) 

(5.116) 

(5.117) 

(5.118) 
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(5.119) 


ii.  or,  C  =  skip,  {{9,  S),  [O' ,  E),  false)  ^  (G2)''’  *  True  and  (0',  E)  ^  Q2- 
We  get: 


h  {Qi  9  Q2) 


(b)  or,  Cm  =  skip  and  (cr, 0)  |=  Qi. 

By  the  premise  2,  we  know  one  of  the  following  holds: 

i.  there  exists  E'  such  that  (C,  E  tt)  Y^p)  — (skip,  E'  l±)  Ep), 

[{9,  E),  [9,  E'),  true)  h  (G2)+  *  True  and  [9,  E')  h  ^2- 

Since  [a,  9)  \=  Ii  *  true,  we  know:  ((a,  9),  (a,  9),  true)  |=  (Gi)^  *  True. 

Thus  we  know 


((a,  E),  (a,  E'),  true)  |=  ((Gi)+ *  True)  °  ((G2)+ *  True)  (5.120) 


Thus  we  get: 

E'),true)  ^  (Gi  ?  G2)^  *  True 

((cr,E),  (cr, 

(5.121) 

Besides,  we  get: 

(rT,E')  h(Ql?Q2) 

(5.122) 

ii.  or,  C  =  skip  and  [9,  E)  |=  Q2- 

We  get: 

(ct,  E)  =  [Qi  5  Q2) 

(5.123) 

6.  for  any  ap  and  Sp,  if  (G,  cr  tt)  ap)  — >  abort,  then  by  the  premise 
(Cm,  d  tt)  9p)  — abort.  By  the  premise  2  and  Lemma 


24 


1,  we  know: 
we  know:  (C,  E  l±)  Ei?-)  — 


for  any  9p, 

>  abort. 


Thus  we  are  done. 


□ 


Lemma  21.  li  I\>G,  R,G,I  \=  (G,  cr,  M)  (C,  E),  (G,  cr  tt)  ap)  — [G',  a")  and  ETEjr,  then  there 
exists  cr'  such  that  cr"  =  a'  ^  ap  and  one  of  the  following  holds: 

(1)  either,  there  exist  M' ,  C'  and  E'  such  that  (C,  E  l±)  Yip)  — (C',  E'  l±)  Yip), 

{{a,  Yl),  [a',  Yl'),  true)  |=  G+  *  True  and  i?,G,/  h  (G',  ct',  M')  (C,  E'); 

(2)  or,  there  exists  M'  such  that  M'  <  M, 

{{a,  E),  [a',  E),  false)  ^  G+  *  True  and  R,G,I^  (G',  a',  M')  <q  (C,  E). 

Proof:  By  induction  over  n. 

Base  Case:  n  =  0.  By  Definition 

Inductive  Step:  n  =  k  +  1.  Thus  there  exist  Gi  and  cr(  such  that 

{G,aWap) — [Gi,a'i)  and  (Gi,cr() — >-”(G',a") 

By  Definition]^  we  know  there  exists  ai  such  that  a[  =  ai\i)  ap  and  one  of  the  following  holds: 

(i)  either,  there  exist  Mi,  Ci  and  Ei  such  that  (C,  E  l±)  Yip)  — >•+  (Ci,  Ei  l±)  Yip), 

((cr,  E),  (cTi,  El),  true)  ^  G+  *  True  and  R,G,I  \=  [Gi,ai,  Mi)^q  (Ci,  Ei). 

By  the  induction  hypothesis,  we  know:  there  exists  cr'  such  that  a"  =  a'  ^  ap  and  one  of  the 
following  holds: 

(a)  either,  there  exist  M' ,  C'  and  E'  such  that  (Ci,  Ei  tt)  Yip)  — >■+  (C',  E'  l±)  E^), 

((ai,  El),  (cr',  E'),  true)  ^  G+  *  True  and  R,G,I[=  (G',  a',  M')  <q  (C',  E'). 

Then 

(C,  E  tt)  Ef)  — (C',  E'  tt)  Yip). 

Since  I  >  G,  we  know 
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((cr,  S),  {a',  S'),  true)  )=  G+  *  True. 

(b)  or,  there  exists  M'  such  that  M'  <  Mi, 

((ai.  Si),  (cr',  El),  false)  ^  G+  *  True  and  R,G,I^  (G',  a',  M')  (Ci,  Si). 

Since  /  >  G,  we  know 

((cr,  E),  (cr',  Ei),true)  |=  G+  *  True. 

(ii)  or,  there  exists  Mi  such  that  Mi  <  M, 

((cr,  E),  (cTi,  E),  false)  |=  G+  *  True  and  R,G,I  \=  (Gi,  cri.  Mi)  ^g  (C,  S). 

The  case  is  similar. 

Thus  we  are  done.  □ 


Lemma  22.  li  I>G,  R,G,I  \=  (G,  cr,  M)  ^g  (C,  S),  (G,  a  W  up)  -^n+i  STEj?,  then  there 

exist  cr',  M',  C'  and  E'  such  that  a"  =  a'  l±)  ap,  (C,  S  l±)  Si?)  — ^  +  (C',  E'  l±)  Ei?),  ((cr,  E),  (cr',  E'),  true)  |= 
G+  *  True  and  R,G,I^  (G',  a',  M')  ^g  (C,  E'). 

Proof:  By  induction  over  n.  Similar  to  Lemma  □ 

Lemma  23.  If  /  >  G,  R,G,I  \=  (G,  cr,  M)  ^g  (C,  S),  (G,  cr  l±)  ap)  — >■"  (skip,  cr")  and  E_LSi?,  then  there 
exists  cr'  such  that  cr"  =  cr'  tt)  cri?  and  one  of  the  following  holds: 

(1)  either,  there  exists  E'  such  that  (C,  E  l±)  Ei?)  — s>+  (skip.  S'  l±)  Ei?), 

((cr,  E),  (cr',  E'),true)  |=  G+  *  True  and  (cr',  E')  |=  Q; 

(2)  or,  C  =  skip,  ((cr,  E),  (cr',  E),  false)  ^  G+  *  True  and  (cr',  S)  |=  Q. 

Proof:  By  induction  over  n.  Similar  to  Lemma  □ 

Lemma  24.  If  R,  G,  I  |=  (G,  cr,  M)  ^g  (C,  S)  and  (G,  a  W  ap)  — abort  and  E_LEi?, 
then  (C,  S  tt)  Si?)  — >  +  abort. 

Proof:  By  induction  over  n.  Similar  to  Lemma  □ 

Lemma  25.  If  /i  [>  Gi,  l2>  G2  and  MPrecise(/i,  12),  then  (Gi  *  True)  5  (G2  *  True)  ^  (Gi  ?  G2)  *  True. 


Proof:  For  any  cr,  E,  cr',  E'  and  b,  if  ((cr,  E),  (cr',  E'),  b)  ^  (Gi  *  True)  5  (G2  *  True),  we  know  there  exist 
9,  9' ,  bi  and  62  such  that 

{{a,  9),  {a',  9'),bi)  h  (Gi  *  True),  {{9,  E),  (9',  S'),  62)  h  (G2  *  True),  b  =  bi  A  62. 

Then  we  know  there  exist  ai,  9i,  0),  @2,  E2,  02  E'2  such  that 

{{ai,9i),{a'i,9'i),bi)  |=  Gi,  ((02,  S2),  (0^,  E' ),  62)  h  G2, 

CTi  c  cr,  01  C  0,  cr'i  C  a',  9'i  c  0',  02  C  0,  E2  C  E,  0^  C  0',  E'2  C  S' 

Since  Ii  >  Gi  and  ^2  »  G2,  we  know 

(ai,0i)|=/i,  (a'i,0()h/i,  (02,S2)h^2,  (0i,Eyh^2. 

Since  MPrecise(/i,  12),  we  know 


01=02,  0i=0^. 


Thus  we  know 


((ai,E2),(cTi,E'2),6)  hGi?G2 


Thus 
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Then  we  are  done. 


((a,E),(a',E'),&)  h(Gi?G2)*True. 


□ 

Lemma  26.  ^  (i?i  *  Id)  ^  (i?2  *  Id). 

Proof:  For  any  a,  E,  a' ,  E'  and  b,  if  {{a,  E),  (cr',  E'),  &)  )=  (i?i  ?  R2)  *  Id,  we  know  there  exist  CTi,  Ei,  <j[, 
E']^,  CT2  and  E2  such  that 

((ci,  Si),  (cr'i,  E'l),  b)  1=  i?i  ?  i?2, 

(T  =  (Tl  l±)  (72,  E  =  El  l±)  E2,  ct'  =  ct)  l±)  (72,  E'  =  E'l  l±)  E2 

Then  we  know  there  exist  9,  9' ,  bi  and  &2  such  that 

(((7i,0),((7'i,0'),&l)  ((0,Ei),(0',E'i),62)  h^2,  6=6iV&2. 

Thus  we  know 

((a,0),((7',0'),&i)  |=i?i*ld,  ((0,E),(0',E'),&2)  |=i?2*ld. 

Thus 

(((7,E),((7',E'),5)  h(i?i*lcl)?(i?2*ld). 

Then  we  are  done.  □ 
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5.4  Soundness  of  Unary  Rules 

Lemma  27.  If  i?,  G,  /  h  {p}C{q},  then  1 1>  {R,  G},  p\J  q  ^  I  *■  true  and  Sta({p,  q},  R  *  Id). 


Proof:  By  induction  over  the  derivation  of  R,G,I  \-  {p}C{q}.  For  the  stability,  we  need  Lemma  28  □ 
Lemma  28.  If  Sta{p,  R  *  Id),  then  Sta([pJ„,  R  *  Id). 

Lemma  29.  If  R,  G,  I  |=  (G,  a,  ws)  diH-,w-q  (D,  S)  and  R  <  R' ,  then  R,G,I  ^  (G,  a,  ws)  diw-,w,q  (D,  L). 
Proof:  We  know:  if  ws'  ws  and  R  <  R' ^  then  ws'  <-}{'  ws.  □ 

We  define: 

.  I  ,/  ,  NX  def  f  (w  +  fci, n  +  ^2)  iiws  =  (w,n) 

inchead(w;s,  (fci,A:2))  =  i  )  ,  ,  ,  ,  )  ,  -r  )  \  , 

y  (w  +  fci,  n  +  K2) ::  ws  if  ws  =  (w,  n) ::  ws 

Lemma  30.  If  R,  G,  I  |=  (G,  cr,  ws)  <H-,w,q  (D,  S),  wi  <  w  and  wsi  =  inchead(ws,  (wi,  0)),  then 
R,GJ\=  (G,  cr,  wsi)  diH-,w-wi-q  (D,  S). 


Proof:  By  co-induction.  From  the  premise,  we  know:  (cr,  E)  ^  *  true. 

1.  For  any  aF^  E^,  G'  and  cr",  if  (G,  cr  l±)  ap)  — >  (G',cr")  and  ELEf,  from  the  premise,  we  know 
there  exists  cr'  such  that  a"  =  a'  ^  ap  and  one  of  the  following  holds: 

(a)  there  exist  ws',  w',  C'  and  E'  such  that  (D,  E  l±)  E^)  — >•+  (C',  E'  l±)  E^r), 

((a,  E),  (a',  E'),  true)  ^  G+  *  True  and  R,G,I  |=  (G',  a',  ws')  (C',  E'). 

By  the  co-induction  hypothesis,  let  ws)  =  inchead(ws',  (wi,0)),  we  know 

i?,  G,  /  h  (C,  a',  ws'i)  ^n-.w-wpq  (C',  E'). 

(b)  there  exists  ws'  such  that  ws'  <-u  ws, 

((cr,  E),  (cr',  E),  false)  |=  G+  *  True  and  R,G,I  \=  (G',  a',  ws')  din-,w,q  (D,  S). 

By  the  co-induction  hypothesis,  let  ws)  =  inchead(ws',  (wi,0)),  we  know 
i?,  G,  J  h  {C,  a',  ws))  ^n-.w-wpq  (D,  S). 

Since  ws'  <u  ws,  we  know  ws)  <u  wsi. 

2.  For  any  ap,  E^,  e,  G'  and  cr",  if  (G,  cr  l±)  ap)  — %  {C ,a")  and  ETEf,  the  proof  is  similar  to  the 
previous  case. 

3.  For  any  tr'  and  E',  if  ((cr,  E),  (ct',  E'),  true)  \=  i?+  *  Id,  from  the  premise,  we  know:  there  exist  ws' 
and  w'  such  that  R,G,I  ^  (G,  a',  ws')  d:'H;w'-,q  (D,  L'). 

By  the  co-induction  hypothesis,  let  ws)  =  inchead(ws',  (1^1,0)),  we  know 
i?,  G,  J  h  {C,  a',  ws))  ^H-.w'-wpq  (D,  S'). 

4.  For  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  false)  t  *  Id,  from  the  premise,  we  know: 

R,G,I  ^  {G,a',ws)<n■,w■,q(n,^')■ 

Iiy  the  co-induction  hypothesis,  we  know  R,  G,  I  |=  (G,  a' ,  wsi)<u-,w-wi-,q  (D,  S'). 

5.  If  G  =  skip,  then  for  any  E^;’,  if  ETEi?,  from  the  premise  we  know  one  of  the  following  holds: 

(a)  there  exist  w' ,  C'  and  E'  such  that  (D,  E  l±)  E^r)  — (C',  E'  l±)  T^p), 

((cr,  E),  (cr,  E'),  true)  ^  G+  *  True  and  (cr,  w' ,  C,  E')  ^  q. 

(b)  there  exists  w'  such  that  ws  =  (w',  0)  and  (cr,  w  +  w' ,  D,  E)  \=  q. 

Thus  wsi  =  (w'  -I-  wi,  0)  and  (cr,  (w  —  wi)  +  (w'  -I-  wi),  D,  E)  |=  q. 

6.  For  any  ap  and  Ep,  if  (G,  a  tt)  ap)  — >  abort  and  E_LEi?,  from  the  premise  we  know: 

(D,  E  l±)  E^)  — abort. 

Thus  we  are  done.  □ 
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The  HIDE-w  rule. 


Lemma  31  (HIDE-w).  If  h  {p}C{q).  then  R,G,I  h  {LpJw}C'{LgJw}. 

Proof:  We  want  to  prove:  for  all  ct,  wi,  D  and  E,  if  (tT,wi,D,E)  ^  [pjw,  then 

R,  G,  I  1=  (C,  a,  (0,  |G|))  ^height(C);^i;L,J„  (»>  S)- 

We  know  there  exists  w  such  that 


(cr,^,©,^) 

From  the  premise,  we  know: 

i?,  G,  /  h  (C,  (0,  |C|))  ^height(C);^;,  (»,  S). 

By  Lemma  [32}  we  are  done.  □ 

Lemma  32.  If  R,  G,  I  |=  (G,  cr,  ws)  (D,  S),  then  R,G,I  ^  (G,  cr,  ws)  dH-,wi-,[q]„  5]). 

Proof:  By  co-induction.  From  the  premise,  we  know:  (tr,  E)  ^  *  true. 

1.  For  any  ap,  E^,  G'  and  cr",  if  (G,  tr  l±)  ap)  — ^  {C',a”)  and  ELEf,  from  the  premise,  we  know 
there  exists  a'  such  that  a"  =  a'  W  ap  and  one  of  the  following  holds: 

(a)  there  exist  ws',  w',  C'  and  E'  such  that  (D,  E  l±)  E^)  — >•+  (C',  E'  tt)  E^r), 

((tr,  E),  (cr',  E'),  true)  ^  G+  *  True  and  R,  G,  I  \=  (G',  cr',  ws')  dn-.w'-q  (C',  E'). 

By  the  co-induction  hypothesis,  we  know  R,G,I  ^  (G',  cr',  ws')  dH-,wp,liU  ^0- 

(b)  there  exists  ws'  such  that  ws'  Ku  ws, 

((cr,  E),  (cr',  E),  false)  |=  G+  *  True  and  R,G,I  \=  (G',  cr',  ws')  d'H\w,q  (D,  E). 

By  the  co-induction  hypothesis,  we  know  R,G,I  \=  (G',  cr',  ws')  dH-,wp,lq\^  (D,  E). 

2.  For  any  ap,  E^,  e,  G'  and  cr",  if  (G,  cr  l±)  ap)  — ^  {G' ,a")  and  ETEf,  the  proof  is  similar  to  the 
previous  case. 

3.  For  any  cr'  and  E',  if  {{a,  E),  (cr',  E'),  true)  \=  R'^  *  Id,  from  the  premise,  we  know:  there  exist  ws' 
and  w'  such  that  R,  G,  I  |=  (G,  cr',  ws')  d'H-,w'\q  (D,  E'). 

By  the  co-induction  hypothesis,  we  know  R,  G,  I  |=  (G,  cr',  ws')  dpL;wi-,[q]„  (D,  E'). 

4.  For  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  false)  t  R'^  *  Id,  from  the  premise,  we  know: 

By  the  co-induction  hypothesis,  we  know  R,  G,  I  |=  (G,  cr',  ws)  d-H-,wi;[q]„  (®,  ^0- 

5.  If  G  =  skip,  then  for  any  E^,  if  E_LEir,  from  the  premise  we  know  one  of  the  following  holds: 

(a)  there  exist  w',  C'  and  E'  such  that  (D,  E  l±)  'Ep)  — (C',  E'  l±)  Ep), 

((cr,  E),  (cr,  E'),  true)  ^  G+  *  True  and  (cr,  w' ,  C',  E')  ^  q. 

Thus  {a,w' ,C' ,E')  \=  [qj„. 

(b)  there  exists  w'  such  that  ws  =  {w' ,  0)  and  {a,  w  +  w' ,  D,  E)  |=  q. 

Thus  (cr,  wi  -|-w',D,  E)  1=  [(?Jw 

6.  For  any  ap  and  Ep,  if  (G,  a  tt)  ap)  — >  abort  and  E_LEir,  from  the  premise  we  know: 

(D,  E  l±)  Ep)  — abort. 

Thus  we  are  done.  □ 


415 


The  WHILE  rule. 

Lemma  33  (WHILE).  If 

1.  R,G,I^{p'}C{p}-, 

2.  pAB^p'*  (wf(l)  A  emp); 

3.  Sta(p,  i?  *  Id);  /  o  {B,  G};  p^  {B  =  B)*  I- 
then  R,  G,  I  |=  {pjwhile  (B)  G{p  A  ^B}. 

Proof:  We  want  to  prove:  for  all  a,  w,  D  and  E,  if  (cr,  E)  \=  p,  then 

R,G,I  \=  (while  {B)  G,  tr,  (0,  |while  (B)  G|))  (b)  C);w;pA^B  (D,E). 

We  know  jwhile  (B)  G|  =  1  and  can  prove  height(while  (B)  G)  =  height(G)  +  1. 

By  co-induction.  From  (cr,  w,  D,  E)  ^  p,  since  p  ^  I  *  {B  =  B),  we  know: 

(cr,  E)  ^ true  (5.124) 

1.  For  any  ap  and  E^,  if  (while  (B)  G,  ct  l±)  ap)  — (G;  while  {B){G},  a  l±)  ap)  and  |i?](7ttiCTF  =  true, 
below  we  prove  1(b)  of  Definition  holds. 

Since  (cr,  ’S)  \=  {B  =  B),  we  know  \B\a  =  true.  Then  we  know 

(cr,  ic,  D,  E)  1=  p  A  (5.125) 

Since  p  A  B  ^  p'  *  (wf(l)  A  emp),  we  know  there  exists  w'  such  that  w'  <  w  and 

(cr,  w',  D,  E)  1=  p'  (5.126) 

From  the  premise  1,  we  know  i?,  G,  I  |=  (G,  cr,  (0,  |G|))  ^height(C);tD';p  ^)- 
By  Lemma  [34l  we  know:  let 

ws' =  (0,0)::(ri;MG|  +  1)  (5.127) 

then 

i?,  G,  /  h  (C;  while  (i?){G},  a,  ws')  ^height(C)+i;^;pA^B  (D,  S)  (5.128) 

We  know  ws'  <height(C')-Hi  (0,  !)■ 

Also,  since  I  >  G  and  (cr,  E)  |=  /  *  true,  we  know  ((cr,  E),  (cr,  E),  false)  \=  G~^  *  True. 

2.  For  any  ap  and  T,p,  if  (while  {B)  G,a\±lap)  — >■  (skip,  cr  l±)  cr^’)  and  =  false,  below  we 

prove  1(b)  of  Definition]^ holds. 

since  (cr,  E)  \=  {B  =  B),  we  know  |i?]cr  =  false.  Then  we  know 

(cr,  u;,  D,  E)  1=  p  A  (5.129) 

By  the  skip  and  frame  rules,  we  know: 

R,  G,  I  1=  (skip,  cr,  (0,  0))  ^height(C)  +  l;u;;pA-B  (D,  E)  (5.130) 

We  know  (0,  0)  <height(C)-i-i  (Oj  1)  and  ((ct,  E),  (ct,  E),  false)  ^  G+  *  True. 

3.  For  any  a'  and  E',  if  ((ct,  E),  (ct',  E'),  true)  \=  *  Id, 

since  Sta(p,  R  *  Id),  we  know  Sta(p,  i?+  *  Id),  thus  there  exists  w'  such  that 

(ct',w',D,E')  hP  (5.131) 

By  the  co-induction  hypothesis,  we  get: 

R,  G,  I  1=  (while  (B)  G,  ct',  (0, 1))  ^height(C)-H;u;';pA-B  (D,  E')  (5.132) 
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(5.133) 


4.  For  any  a'  and  S',  if  ((cr,  S),  (tr',  S'),  false)  ^  *  Id, 

since  Sta(p,  R  *  Id),  we  know  Sta(p,  i?+  *  Id),  thus 

(cr','u;,D,S')  \=p 

By  the  co-induction  hypothesis,  we  get: 

i?,  G,  /  h  (while  (B)  C,  a',  (0, 1))  ^height(C)+i;^;pA^B  (D,  s')  (5.134) 

Thus  we  are  done.  □ 

Lemma  34.  If 

1.  R,G,I  \=  {Ci,a,wsi)^-H-,w'„-pi^,^); 

2.  for  all  a,  w,  D  and  S,  if  {a,  w,D,  S)  \=  p' ,  then  R,G,I  \=  (G,  cr,  (0,  |G|))  din-w^p  (D,  S); 

3.  pAB=>p'*  (wf  (1)  A  emp); 

4.  Sta(p,  i?  *  Id);  /  o  {R,  G};  p  ^  {B  =  B)  *  I; 

5.  ws  =  (0,  0) ::  inchead(wsi,  {wq,  1)); 

6.  root(wsi)  =  (wi,  _);  Wq -I- rci  <  Wq; 

then  R,  G,  I  |=  (Gi;  while  {B){G),  a,  ws)  wo-, p/\-.B  (D,  S). 

Proof:  By  co-induction.  From  the  first  premise,  we  know  {a,  S)  |=  /  *  true. 

1.  For  any  ap,  Si?,  G(  and  cr",  if  (Gi;  while  {B){G},  a  l±)  ap)  — ^  (G(;  while  {B){G},  a"),  i.e., 

(Gi,(jl±)(Ji?)  — (G(,ct"),  from  the  premise  1,  we  know:  there  exists  a'  such  that  a"  =  ct'Wctf  and 
one  of  the  following  holds: 

(a)  there  exist  ws),  Wq,  C'  and  S'  such  that  (D,  S  l±)  Sf)  — (C',  S'  l±)  Sf), 

((cr,S),  (cr'.  S'),  true)  ^  G+  *True  and  R,G,I  |=  (G( ,  cr',  ws'i)  (C',  S'). 

Suppose  root(ws'i)  =  (w'i,_). 

By  the  co-induction  hypothesis,  let  ws'  =  (0, 0) ::  inchead(w;s'i,  (wq,  1)),  we  know: 

R,G,H=  (G(;  while  (B){G},  a' ,  ws')  A,f^pi;wi,'+w{;pA^B  (<C' ,  'S'). 

(b)  there  exists  ws'^  such  that  ws'^  <pi  wsi, 

((cr,  S),  (cr',  S),  false)  |=  G+  *  True  and  R,G,I  \=  (G(,  ct',  ws'^)  din-,w'p,p  (D,  S). 

Suppose  root(ws'i)  =  (w'^,  _).  Since  ws'i  <%  wsi,  we  know  w'l  <  wi.  Thus  Wg  -I-  <  wq- 

By  the  co-induction  hypothesis,  let  ws'  =  (0, 0) ::  inchead(w;s'j,  (w),  1)),  we  know: 

R,G,H=  (G(;  while  {B){G},  cr',  ws')  ^-h+i-wo-.pa^b  (D,  S). 

Since  ws'i  <u  wsi,  we  know:  ws'  <u+i  ws. 

2.  For  any  ap,  Sf,  e,  G(  and  ct",  if  (Gi;  while  {B){G},a^ap)  — ^  (G(;  while  (i?){G}, cr"),  the  proof 
is  similar  to  the  previous  case. 

3.  For  any  ap  and  Sf,  if  (Gi;  while  (i3){G},  a  l±)  ap)  — 5-  (while  {B){G},  a  l±)  ap),  i.e.,  Gi  =  skip, 
from  the  premise  1,  we  know  one  of  the  following  holds: 

(a)  there  exists  wi  such  that  wsi  =  (wi,0)  and  (cr,  wi  -I- Wg,D,  S)  \=  p. 

Thus  ws  =  (0, 0) ::  (wi  -I-  w'q,  1).  We  know  (0,  0) ::  (wi  +  w'g,  0)  <n+i  ws. 

Also  we  know  ((cr,  S),  (a,  S),  false)  |=  G"*"  *  True. 

Below  we  prove: 

i?,  G,  /  h  (while  (B){C},  a,  (0,  0) ::  (wi  +  w'g,  0))  dn+i-,wo;pA^B  (C',  S')  (5.135) 

By  co-induction.  Since  p  =>  I  *  [B  =  B),  we  know  (cr.  S')  |=  /  *  true. 
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i.  For  any  ap  and  Tip,  if  (while  {B){C},  a^ap)  — (C;  while  {B){C},  a\±)ap)  and  l-Bjo-acrj?  = 
true,  below  we  prove  1(b)  of  Definition]^ holds. 

Since  (a,  E')  |=  (B  =  B),  we  know  |i3]cr  =  true.  Then  we  know 

{a,wi  +  Wq,C' ,T')  \=  p  A  B  (5.136) 

Since  p  A  B  ^  p'  *  (wf(l)  A  emp),  we  know  there  exists  w'l  such  that  <  wi  +  Wq  and 

ia,w[,C,T')^p'  (5.137) 

From  the  premise  2,  we  know  R,  G,  I  \=  {C,  a,  (0,  IC'D)  ^0- 

By  the  co-induction  hypothesis,  we  know: 

i?,  G,  /  h  (C;  while  {B){C},  a,  (0, 0) ::  {w[,  \C\  +  1))  (C',  S')  (5.138) 

We  know  (0,0)::(u;i,  |G|  -h  1)  <«+i  (0,0)::(wi  -|-?iio,0). 

Also  we  know  ((cr.  S'),  (ct,  S'),  false)  |=  G+  *  True. 

ii.  For  any  ap  and  Tp,  if  (while  {B){C},  a  ^  ap)  — (skip,  cr  tt)  ctf)  and  |il]o-ao-F  =  false, 
below  we  prove  1(b)  of  Definition]^ holds. 

Since  (cr.  S')  ^  (_B  =  _B),  we  know  \B\^  =  false.  Since  (cr,  wi  +  w'q,  C',  S')  |=  p,  we  know: 

(cr,  u>i -h  icq,  C',  S')  ^  p  A  (5.139) 

Since  wi  +  Wq  <  wq,  we  know: 

(cr,wo,C',S')  (5.140) 


By  the  skip  and  frame  rules,  we  know: 


i?,  G,  J  h  (skip,  a,  (0, 0))  (C',  S')  (5.141) 

We  know  (0, 0)  K-p+i  (0,  0) ::  (rci  +  Wq,  0)  and  ((cr.  S'),  (cr.  S'),  false)  \=  G+  *  True. 

iii.  For  any  cr'  and  S",  if  {{a,  S'),  (ct',  S"),true)  \=  i?+  *  Id, 

since  Sta(p,  R  *  Id),  we  know  Sta(p,  i?+  *  Id),  thus  there  exists  w[  such  that 

(a',u;'i+u;',C',S")  hP  (5.142) 

By  the  co-induction  hypothesis,  we  get: 

R,  G,  I  h  (while  (i?){G},  a',  (0,  0) ::  iw[  +  re' ,  0))  ^n+Uw',+^',-,pA^B  (C',  S")  (5.143) 

iv.  For  any  cr'  and  S",  if  ((cr,  S'),  (cr',  S"),  false)  ^  ii+  *  Id, 
since  Sta(p,  R  *  Id),  we  know  Sta(p,  R'^  *  Id),  thus 

(cr',  Wi  -I-  Wq,  C',  S")  ^  p  (5.144) 

By  the  co-induction  hypothesis,  we  get: 

i?,  G,  /  h  (while  {B){C},  a',  (0,  0) ::  (zci  +  w'^,  0))  An+i-,u,o-,pA^B  (C',  S")  (5.145) 

Thus  we  have  proved  (5.135). 

(b)  there  exist  C'  and  S'  such  that  (D,  S  l±)  Tp)  — :>+  (C',  S'  l±)  Tp), 

((cr,  S),  (cr.  S'),  true)  \=  G+  *  True  and  (cr,  w'i,C',  S')  \=  p. 

We  can  prove: 

i?,  G,  /  h  (while  {B){C},  <7,  (0,  0) ::  (zc'i,  0))  Au+pm'ppA^B  (C',  S')  (5.146) 


in  the  similar  way  as  the  previous  case. 
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4.  For  any  a'  and  E',  if  ((tr,  E),  (cr',  E'),  true)  ^  i?+  *  Id, 

from  the  premise,  we  know  there  exist  ws[  and  Wq  such  that  R,G,I  \=  (Ci,  cr',  ws{)  (D,  ^0- 

Suppose  root(w;s2)  =  (w{,-). 

By  the  co-induction  hypothesis,  we  know:  let  ws'  =  (0, 0) ::  inchead(w;s^,  (wg,  1)),  then 
R,G,I  \=  (Ci;  while  {B){G},  a',  ws')  ^n+i-.w'^+wi-pA^B  (D,  S'). 

5.  For  any  a'  and  E',  if  ((tr,  E),  (cr',  E'),  false)  ^  *  Id, 

from  the  premise,  we  know:  R,G,I  ^  (Ci,  cr',  wsi)  di'H;w'g-,p  (D,  E'). 

By  the  co-induction  hypothesis,  we  know: 

R,GJ  \=  (Ci;  while  {B){G},  cr',  ws)  :<n+i-.wo-,pA^B  (D,  E'). 

6.  For  any  ap  and  E^,  if  (Ci;  while  {B){G},aWaF)  — ^  abort,  we  know  {Ci,a  l±)  0'^:’)  — >  abort.  By 
the  premise  1,  we  know:  (D,  E  l±)  Ej?)  — >■■*■  abort. 

Thus  we  are  done.  □ 


The  SEQ  rule. 

Lemma  35  (SEQ).  If 

1.  R,G,I^{p}Gi{p'}-, 

2.  R,G,I^{p'}G2{qh 

3.  /oG; 

then  R,G,I  \=  {p}Gi;G2{g}. 


Proof:  We  want  to  prove:  for  all  cr,  w,  D  and  E,  if  (cr,  w,D,  E)  \=  p,  then 

RjG,I  ^  (Gi;  G2,  cr,  (0,  |Gi;  G2I))  ^height(Ci;C2);i«;q  S). 

We  know  |Gi;G2|  =  |Gi|  -h  IG2I  -h  1  and  can  prove  height(Gi;  G2)  =  moa;{height(Gi),  height(G2)}. 
Since  (cr,  w,  D,  E)  ^  p,  by  the  premise  1,  we  know: 

i?,  G,  /  h  (Cl,  a,  (0,  IGiD)  ^height(Ci);-;p'  (D,  E). 


By  Lemma we  know:  R,G,I\=  (Gi,ct,  (0,  |Gi |))  ^height(Ci;C2);j«;p'  (D,E). 

From  the  premise  2,  by  Lemma  we  know:  for  all  cr,  w,  D  and  E,  if  (cr,  w,D,  E)  |=  p' ,  then 
\=  (G2,cr,  (0,  |G2|))^height(Ci;C2);j«;9(D,  E). 

By  Lemma  |36l  we  are  done.  □ 

Lemma  36.  If 


1.  i?,  G,/ ^  (Gi,cr,  wsi)^^i;u,;p/(D,  E); 

2.  for  all  CT,  w,  D  and  E,  if  (cr,  ?ii,D,  E)  \=  p' ,  then  R,  G,  I  |=  (G2,  cr,  (0,  IG2I))  diH-,w-q  (D,  E); 

3.  /oG; 

4.  ws  =  inchead(wsi,  (0,  IG2I -h  1)); 
then  R,  G,  I  |=  (Gi;  G2,  cr,  ws)  :<'H-,w,q  (D,  E). 


Proof:  By  co-induction.  From  the  premise  1,  we  know:  (cr,  E)  ^  *  true. 

1.  for  any  ap,  Yip,  G(  and  cr",  if  (Gi;  G2,  cr  tt)  ap)  — (G(;  G2,  cr"),  i.e.,  (Gi,  cr  l±)  ap)  — (G(,  cr"), 
from  the  premise  1,  we  know:  there  exists  cr'  such  that  cr"  =  a'  iSap  and  one  of  the  following  holds: 
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(a)  there  exist  ws[,  w',  C'  and  S'  such  that  (D,  S  l±)  Si?)  — (C',  S'  l±)  Si?), 

((a,  S),  (a',  S'),  true)  ^  0+  *  True  and  R,G,I  |=  {C[,a',ws[)^n-,w’-p'  (C',S'). 

By  the  co-induction  hypothesis,  we  know:  let  ws'  =  inchead(ws'i,  (0,  \C2\  +  1)),  then  R,G,I  |= 
(C;;C2,a',«;s')^ii;^-;,(C',S'). 

(b)  there  exists  ws'i  such  that  ws'i  <-h  wsi, 

((a,  S),  (a',  S),  false)  |=  G+  *  True  and  R,G,Ij=  {G[,a' ,  ws'-y)  diH\w,p'  (D,  S). 

By  the  co-induction  hypothesis,  we  know:  let  ws'  =  inchead(ws'j,  (0,  IG2I  -|-  1)),  R,G,I  |= 
(Gi ;  G2,  cr',  ws')  diH-,w-q  (D,  S). 

Since  ws'i  know:  ws'  <u  ws. 

2.  for  any  ap,  Si?,  e,  G'l  and  cr",  if  (Gi;  G2,  crttlcTi?)  (G^ ;  G2,  cr"),  the  proof  is  similar  to  the  previous 
case. 


3.  for  any  ap  and  Si?,  if  (Gi;  G2,  cr  l±)  ap)  — (G2,  cr  l±)  ap)  and  Gi  =  skip, 
from  the  premise  1,  we  know  one  of  the  following  holds: 


(a) 

(b) 


there  exist  w' ,  C  and  S'  such  that  (D,  S  l±)  Si?)  — (C',  S'  l±)  Si?), 

((cr,  S),  (cr.  S'),  true)  ^  G+  *  True  and  (cr,  w' ,  C',  S')  \=  p' . 

From  the  premise  2,  we  know:  R,G,I  \=  (G2,  cr,  (0,  IG2I))  (C',  S'), 

there  exists  wi  such  that  wsi  =  (wi,0)  and  (cr,  w  -f  S)  |=  p' . 

Thus  we  know  ws  =  (wi,  IG2I  +  1). 

We  know  (wi,  IG2I)  <n  ws. 

Since  (cr,  S)  |=  /  *  true  and  1 1>  G,  we  know  ((cr,  S),  (cr,  S),  false)  |=  G+  *  True. 
From  the  premise  2,  we  know:  R,G,I  \=  (G2, cr,  (0,  IG2I)) +wi;q  (D,  S). 

By  Lemma [M|  we  get:  R,G,I  \=  (G2,cr,  (wi,  \C2\))dn-,w-qiV),'S>). 


4.  for  any  cr'  and  S',  if  ((cr,  S),  (ct'.  S'),  true)  ^  *  Id, 

from  the  premise,  we  know:  there  exists  ws'i  w'  such  that 

i?,  G,  /  h  {Cl,  a',  ws'i)  dn-.w'-,p'  (D,  S'). 

By  the  co-induction  hypothesis,  we  know:  let  ws'  =  inchead(ws),  (0,  IG2I  +  1)),  then  R,G,I  |= 
(Gi;  G2,  cr',  ws')  dn-,w'-q  (D,  S'). 


5.  for  any  cr'  and  S',  if  ((cr,  S),  (cr'.  S'),  false)  \=  *  Id, 

from  the  premise,  we  know:  R,G,I  ^  (Gi,  ct',  wsi)  dn-.w-y  (D,  S'). 

By  the  co-induction  hypothesis,  we  know:  R,G,I  |=  (Gi;  G2,  cr',  ws)  dw.w-q  (D,  S'). 


6.  for  any  ap  and  Si?,  if  (Gi;G2,ct  tt)  ap)  — abort,  we  know:  (Gi,ct  tt)  ap)  — )•  abort.  By  the 
premise  1,  we  know:  (D,  S  l±)  Si?)  — >■+  abort. 


Thus  we  are  done. 


□ 


The  ATOM  rule. 

Lemma  37  (ATOM).  If 

1.  hsL  [p]C[q]] 

2-  (M  IX  l[9jj)  ^  G*True; 

3.  pV  q  ^  I  *  true; 

4.  Locality(G); 
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then  [I],G,I  h  {p){C){q}. 

Proof:  We  want  to  prove:  for  all  a,  w,  D  and  E,  if  (cr,  w,D,  E)  \=  p,  then 

[I],G,  I  h  ((C),  a,  (0,  l(C)l))  ^height((C»;»;9  (»,  S). 

We  know  |(C')|  =  1  and  can  prove  height((C'))  =  1. 

By  co-induction.  Since  p  =>  I  *  true,  we  know  (cr,  E)  ^  *  true.  From  the  premises  1  and  2, 

prove: 

(C,  cr)  *  abort ,  {G,a) ■ 

By  Locality(C'),  we  know:  for  any  ap, 

{G,a 'S  ap) -/G*  abort ,  {G,  a  ^  ap) ■ 

1.  for  any  ap,  Ei?,  C"  and  a",  if  ((C),  cr  l±)  ctf)  — ^  (C",cr"), 
by  the  operational  semantics,  we  know  C"  =  skip  and 

(C,  CT  l±)  ctf)  — *  (skip,  cr") 

by  Locality(C'),  we  know:  there  exists  cr'  such  that  cr"  =  a'  ktlap  and  (C,  cr)  — >•*  (skip,  a'). 
From  ^sL  and  (C, ct)  — >■*  (skip,cr'),  we  know: 

(cr',w,D,  E)  1=  q 


Thus  we  know: 

((cr,  E),  (cr',  E),  false)  ^  [[pjj  k  [[grJJ 

Since  ([[pjj  k  [[gj])  ^  G  *  True,  we  know  {{a,  E),  {a',  E),  false)  ^  G~^  *  True. 

Since  q=>  I  *  true  and  Sta(g,  [/]  *  Id),  by  the  skip  and  frame  rules,  we  know: 

[/],  G,  I  1=  (skip,  a',  (0, 0))  (D,  E) 

Also,  we  know:  (0,0)  <i  (0, 1). 

2.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  true)  \=  ([/])'*'  *  Id,  we  know  cr'  =  cr  and  E'  =  E. 

By  the  co-induction  hypothesis,  we  know:  [/],  G,  I  |=  {{G),a,  (0, 1))  dii-,w,q  (D,  E). 

3.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  false)  )=  ([/])^  *  Id,  we  know  cr'  =  cr  and  E'  =  E. 

By  the  co-induction  hypothesis,  we  know:  [/],  G,  I  |=  {{G),a,  (0, 1))  (D,  E). 

Thus  we  are  done. 

The  ATOM+  rule. 

Lemma  38  (ATOM"*").  If 

1.  hsL  b']G[g']; 

2.  p  p']  q'  g;  +  G  {a,  b}; 

3-  (M  oc  [[gJJ)  ^  G*True; 

4.  p\/  q  ^  I  *  true; 

5.  Locality(G); 


we  can 

(5.147) 

(5.148) 

(5.149) 

(5.150) 

(5.151) 

(5.152) 
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then  [I],G,I  h  {p){C){q}. 

Proof:  We  want  to  prove:  for  all  a,  w,  D  and  E,  if  (ct,  w,D,  E)  \=  p,  then 

[I],G,  I  h  {{G),<J,  (0,  l(C)l))  ^height((C»;»;9  (»,  S). 

We  know  |(C')|  =  1  and  can  prove  height((C'))  =  1. 

By  co-induction.  Since  p  =>  I  *  true,  we  know  (cr,  E)  ^  *  true.  From  the  premises  1  and  2,  we  can 

prove: 

abort,  {G,a)  ■  (5.153) 

By  Locality(C'),  we  know:  for  any  ap, 

{G,a 'S  ap)  abort ,  {G,  a 'S  ap) ■  (5.154) 

1.  for  any  ap,  T,p,  G'  and  tr",  if  ((C),  cr  l±)  ctf)  — ^  (C",cr"), 
by  the  operational  semantics,  we  know  G'  =  skip  and 

{C,a'Sap) — >■*  (skip,  cr")  (5.155) 

by  Locality(C'),  we  know:  there  exists  a'  such  that  a"  =  a'  ^  ap  and  (C,  a)  — >■*  (skip,  a'). 

From  p  p' ,  we  know  one  of  the  following  holds: 

(a)  either,  a  is  -I-,  and  there  exist  w',  D'  and  E'  such  that  (D,  E  l±)  Ef)  — (D',  E'  l±)  Ef) 
and  (ct,  rc',  D',  E')  \=p'; 

(b)  or,  a  is  0,  and  there  exist  w' ,  D'  and  E'  such  that  (cr,  ir',  D',  E')  |=  p' ,  w'  =  w,W  =13  and 
E'  =  E. 

For  either  case,  from  [pIC'i'zT  and  (C,  cr)  — >•*  (skip,cr'),  we  know: 

(ct',w',D',E')  h  (5-156) 

From  q'  q,  we  know  one  of  the  following  holds: 

(a)  either,  b  is  -I-,  and  there  exist  w” ,  D"  and  E"  such  that  (D',  E'  l±)  Ef)  — (D",  E"  l±)  Ef) 
and  (ct',w",D",E")  ^q; 

(b)  or,  b  is  0,  and  there  exist  w” ,  D"  and  E"  such  that  (cr',  w",  D",  E")  \=  q,  w"  =  w' ,  D"  =  D' 
and  E"  =  E'. 

Since  -I-  €  {a,  b},  we  know  the  following  must  hold: 

there  exist  w” ,  C"  and  E"  such  that  (C,  E  l±)  Ef)  — (C",  E"  l±)  Ef)  and  (cr',  w” ,  C",  E")  |=  q. 

We  know: 

((cr,E),(cr',E"),true)  ^  |[pjj  oc  [[gJJ  (5.157) 

Since  ([[pjj  oc  [[gj])  ^  G  *  True,  we  know  ((cr,  E),  (cr',  E"),true)  |=  G+  *  True. 

Since  q=>  I  true  and  Sta((3',  [/]  *  Id),  by  the  skip  and  frame  rules,  we  know: 

[/],  G,  I  h  (skip,  cr',  (0, 0))  (C",  E")  (5.158) 

2.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  true)  \=  ([/])"'"  *  Id,  we  know  a'  =  a  and  E'  =  E. 

By  the  co-induction  hypothesis,  we  know:  [/],  G,  I  |=  {{G),a,  (0, 1))  (D,  E). 

3.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  false)  \=  ([7])^  *  Id,  we  know  a'  =  a  and  E'  =  E. 

By  the  co-induction  hypothesis,  we  know:  [/],  G,  I  ^  {{G),a,  (0, 1))  dii-,w,q  (D,  E). 
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Thus  we  are  done. 


□ 


Lemma  39.  If 

1.  R,G,Ih{p}{C){qy, 

2.  hgL  is  sound  w.r.t.  |=sl; 

3.  Locality(C); 

4.  (cr,^,©,  E)  \=p, 

then  for  any  ap,  (C,  cr  l±)  ap)  —/G-*  abort  and  (C,  a  l±)  ap)  ■. 

Proof:  By  induction  over  the  derivation  of  R,G,I  \-  {p}(C'){(7}.  □ 

The  ATOM-R  rule. 

Lemma  40  (ATOM-R).  If 

1.  [/],G,/|=M(C)M; 

2.  Sta({p,  q},  i?  *  Id);  I  >  {R,  G};  pV  q  ^  I  *  true; 

3.  for  all  a  and  ap,  if  {a,  _)  |=  p,  (G,  a  l±)  ap)  abort  and  (G,  a  l±)  ap)  •; 

then  R,G,/|=M(G)M. 

Proof:  We  want  to  prove:  for  all  a,  w,  D  and  E,  if  (ct,  E)  \=  p,  then 


i?,  G,  /  h  m,  a,  (0,  1(G)  D)  ^height((C»;^;g  (»,  S). 


We  know  |(G)|  =  1  and  can  prove  height((G))  =  1. 

By  co-induction.  Since  p  ^  I  *  true,  we  know  (a,  E)  ^  *  true. 

1.  for  any  ap,  T,p,  G'  and  a" ,  if  ((G),  cr  l±)  ctf)  — >  (C' ,a"), 


by  the  operational  semantics,  we  know  G'  =  skip  and 


(G,  a  l±)  ap)  — >■*  (skip,  a”) 


(5.159) 


From  the  first  premise,  we  know: 


[/],G,J^((G),a,  (0,l))^i;„;,(D,E). 


Thus  there  exists  a'  such  that  a"  =  cr'  l±)  cr^  and  one  of  the  following  holds: 

(a)  there  exist  ws' ,  w' ,  C'  and  E'  such  that  (D,  EI±)Ef)  — (C',  E'1±)Ef),  ((cr,  E),  (cr',  E' 


),true)  |: 


G’*'  *  True  and 


[/],  G,  I  ^  (skip,  cr',  ws')  dii-w'-q  (C',  E') 


(5.160) 


From  (5.160),  we  know  one  of  the  following  holds: 


i.  there  exist  w",  C"  and  E"  such  that  (C',  E'  l±)  Ef)  — (C",  E"  l±)  Ef), 
((cr',  E'),  (ct',  E"),  true)  |=  G"*"  *  True  and  (cr',  w",  C",  E")  |=  q. 


Thus  we  know: 


(C,  E  tt)  Ef)  — (C",  E"  tt)  Ef) 
((cr,  E),  (cr',  E"),  true)  |=  G^  *  True 


(5.161) 

(5.162) 


Since  q  ^  I  *  true  and  Sta{q,  R*  Id),  by  the  skip  and  FRAME  rules,  we  know: 


(5.163) 
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ii.  there  exists  w”  such  that  ws'  =  (w",0)  and  {cr',w'  +  w'',C,  S')  \=  q. 

Since  q  ^  I  *  true  and  Sta{q,  R*  Id),  by  the  skip  and  FRAME  rules,  we  know: 


R,GJ  \=  (skip,  cr',  (0,0))  (C',S')  (5.164) 

(b)  there  exists  ws'  such  that  ws'  <i  (0, 1),  ((cr,  S),  (cr',  S),  false)  |=  *  True  and 

[/],  G,  I  ^  (skip,  a',  ws')  dii-w-q  (D,  S)  (5.165) 


From  (5.165),  we  know  one  of  the  following  holds: 


i.  there  exist  w',  C'  and  S'  such  that  (D,  S  tt)  S^?)  — >■+  (C',  S'  l±)  S^), 
((cr',  S),  (cr',  S'),  true)  ^  G+  *  True  and  (cr',  w',C',  S')  ^  q. 

Thus  we  know: 

((cr,  S),  (cr'.  S'),  true)  |=  G^  *  True 


Since  q=>  I  *  true  and  Sta(5,  i?  *  Id),  by  the  skip  and  FRAME  rules,  we  know: 


(5.166) 


i?,  G,  /  h  (skip,  a',  (0, 0))  (C',  S') 


(5.167) 


ii.  there  exists  w'  such  that  ws'  =  (w',0)  and  [a' ,w  +  w',D,  S)  |=  q. 

Since  ws'  <i  (0, 1),  we  know  w'  =  0. 

Since  q=>  I  *  true  and  Sta{q,  R*  Id),  by  the  skip  and  frame  rules,  we  know: 


R,  G,  I  1=  (skip,  a',  (0, 0))  (D,  S)  (5.168) 

2.  for  any  cr'  and  S',  if  ((cr,  S),  (cr'.  S'),  true)  ^  R'^  *  Id, 

Since  (cr,  w,  D,  S)  \=  p  and  Sta(p,  R*  Id),  we  know  there  exists  w'  such  that  (cr',  w' ,  D,  S')  \=  p. 

By  the  co-induction  hypothesis,  we  know:  R,G,I  \=  {{G),a',  (0, 1))  (D,  S'). 

3.  for  any  cr'  and  S',  if  ((cr,  S),  (ct'.  S'),  false)  ^  *  Id, 

Since  (cr,  w,D,  S)  ^  p  and  Sta(p,  R  *  Id),  we  know  (cr',  w,  D,  S')  \=  p. 

By  the  co-induction  hypothesis,  we  know:  R,G,I  \=  ((G),  cr',  (0, 1))  :<i-w,q  (D,  S'). 

Thus  we  are  done.  □ 


The  A-CONSEQ  rule. 

Lemma  41  (A-CONSEQ).  If 

1.  pM^p'; 

2.  R,G,I^{p'}C{q'}; 

3.  q'  q; 

4.  Sta({p,  q},  i?  *  Id);  I  >  {R,  G};  pVgVp'Vg'^I*  true; 
then  R,  G,  I  |=  {p}G{g}. 

Proof:  We  want  to  prove:  for  all  ct,  w,  D  and  S,  if  (cr,  ?ii,D,  S)  \=  p,  then 

i?,  G,  /  h  (G,  a,  (0,  |G|))  ^height(C);^;,  (»,  S). 

Let  %  =  height(G). 

By  co-induction.  Since  p  ^  I  *  true,  we  know  (cr,  S)  ^  *  true. 
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1.  for  any  ap,  ^f,  C  and  cr",  if  (C,  ct  I±)  gf)  — {C ,a"), 
from  p  =>  p' ,  we  know  one  of  the  following  holds: 

(a)  either,  there  exist  w' ,  D'  and  S'  such  that  (D,  E  l±)  Si?)  — (D',  S'  l±)  Yjf) 

((a,  S),  (a,  S'),  true)  ^  G+  *  True  and  (a,  w' ,  D',  S')  \=  p'] 

(b)  or,  there  exist  w' ,  D'  and  S'  such  that  (cr,  w',  D',  S')  |=  p',  w'  =  w,  D'  =  D  and  S'  =  S. 

For  either  case,  from  R,  G,  I  |=  {p'}C{q'},  we  know: 

R,  G,  I  1=  (C,  u,  (0,  |C|))  (D',  s')  (5.169) 

Thus  there  exists  cr'  such  that  cr"  =  cr'  l±)  cr^?  and  one  of  the  following  holds: 

(a)  either,  there  exist  ws' ,  w” ,  C"  and  S"  such  that  (D',  S'  l±)  S^?)  — >•+  (C",  S"  l±)  S^;’), 

((cT,  S'),  (ex',  S"),  true)  ^  G+  *  True  and  R,G,I\=  (G',  tr',  ws')  (C",  S"); 

(b)  or,  there  exists  ws'  such  that  ws'  <h  (0,  |G|), 

((cr.  S'),  (cr'.  S'),  false)  |=  G"*"  *  True  and  R,  G,  /  |=  (G',  cr',  ws')  d:H;w'-q'  (D',  S'). 

Then,  we  know  one  of  the  following  holds: 

(a)  there  exist  ws',  w",  C"  and  S"  such  that  (D,  S  l±)  Sp’)  — (C",  S"  l±)  T,f), 
((CT,S),((T',S"),true)  1=  G+  *True  and  i?,G,/  h  [C' ,g' ,  ws')  ^n-,w"-q'  (C",S"). 

By  Lemma  [42l  we  know: 

i?,  G,  /  h  {O',  a',  ws')  <H-,w"-q  (C",  S")  (5.170) 

(b)  there  exists  ws'  such  that  ws'  <«  (0,  |G|), 

((cr,  S),  (cr',  S),  false)  |=  G+  *  True  and  R,G,I  \=  (G',  cr',  ws')  <n\w,q'  (D,  S). 

By  Lemma  [4^  we  know: 

i?,  G,  J  h  {C,  a',  ws')  din-,w,q  (D,  S)  (5.171) 

2.  for  any  ap,  ^f,  e,  G'  and  cr",  if  (G,  cr  l±)  ap)  — ^  (G',  a"),  the  proof  is  similar  to  the  previous  case. 

3.  for  any  cr'  and  S',  if  ((cr,  S),  (cr'.  S'),  true)  \=  R'^  *  Id, 

Since  (cr,  w,  D,  S)  \=  p  and  Sta(p,  R*  Id),  we  know  there  exists  w'  such  that  (cr',  w',  D,  S')  \=  p. 

By  the  co-induction  hypothesis,  we  know:  R,G,I  |=  (G,  cr',  (0,  |G|))  (D,  S'). 

4.  for  any  cr'  and  S',  if  ((cr,  S),  (cr'.  S'),  false)  \=  *  Id, 

Since  (cr,  w,D,  S)  ^  p  and  Sta(p,  R  *  Id),  we  know  (ct',  w,  D,  S')  |=  p. 

By  the  co-induction  hypothesis,  we  know:  R,G,I  \=  (G,  cr',  (0,  |G|))  di'H;w,q  (D,  S'). 

5.  if  G  =  skip,  then  for  any  S^r, 

from  p  p',  we  know  one  of  the  following  holds: 

(a)  either,  there  exist  w',  D'  and  S'  such  that  (D,  S  l±)  Sp)  — >■+  (D',  S'  tt)  S^) 

((cr,  S),  (cr.  S'),  true)  \=  G+  *  True  and  (cr,  w',  D',  S')  \=  p'; 

(b)  or,  there  exist  w',  D'  and  S'  such  that  (cr,  w',  D',  S')  \=  p',  w'  =  w,  D'  =  D  and  S'  =  S. 

For  either  case,  from  R,  G,  I  |=  {p'}C{q'},  we  know: 

i?,  G,  /  h  (skip,  a,  (0, 0))  (D',  S')  (5.172) 

Then  one  of  the  following  holds: 
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(a)  either,  there  exist  w” ,  D"  and  S"  such  that  (D',  S'  l±)  Ep)  — >■+  (D",  E"  l±)  E^), 

((cr,  E'),  (cr,  E"),  true)  \=  G+  *  True  and  {a,  lu",©",  E")  ^  g'; 

(b)  or,  there  exist  w” ,  D"  and  E"  such  that  w”  =  w' ,  D"  =  D',  E"  =  E'  and  (ct,  u>",D", ! 
From  q'  =>  q,  we  know  one  of  the  following  holds: 

(a)  either,  there  exist  w'",  D'"  and  E'"  such  that  (D",  E"  l±)  Ei.)  — (D'",  E'"  l±)  Ef) 
((cr,  E"),  (cr,  E'"),  true)  |=  G+  *  True  and  (cr,  w'" ,  D'",  E'")  \=  q; 

(b)  or,  there  exist  w'",  D'"  and  E'"  such  that  (cr,  w'",  D'",  E'")  ^  q,  w'”  =  w" ,  D'"  = 
E'"  =  E". 

Thus  we  get  one  of  the  following  holds: 

(a)  either,  there  exist  w'" ,  C'"  and  E'"  such  that  (D,  E  l±)  Ef)  — (C'",  E'"  l±)  Ef) 

((cr,  E),  (cr,  E'"),  true)  \=  G+  *  True  and  {a,  w'” ,  C'",  E'")  |=  g; 

(b)  or,  (cr,-u;,D,E)  ^  q. 

6.  for  any  ap  and  Ef,  if  (G,  cr  l±)  ap)  — s-  abort, 
from  p  ^  p' ,  we  know  one  of  the  following  holds: 

(a)  either,  there  exist  w' ,  D'  and  E'  such  that  (D,  E  l±)  Ef)  — (D',  E'  l±)  Ef) 

((cr,  E),  (cr,  E'),  true)  \=  G+  *  True  and  (cr,  w',  D',  E')  \=  p'; 

(b)  or,  there  exist  w' ,  D'  and  E'  such  that  (cr,  w',  D',  E')  \=  p',  w'  =  w,  D'  =  D  and  E'  = 
For  either  case,  from  R,G,I  |=  {p'}C{q'},  we  know: 

i?,G,/|=(G,a,  (0,|G|))^„;,„,;,,  (D',E') 

Then  we  know:  (D',  E'  l±)  Ef)  — abort.  Thus  (D,  E  l±)  Ef)  — abort. 

Thus  we  are  done. 

Lemma  42.  If 

1.  E,  G,  I  1=  (C,cr,ws)^'H;i^;q>  (D,i:); 

2.  q'  ^  q- 

3.  Sta(q,  R  *  Id);  I  o  {R,  G};  q^  I  *  true; 
then  R,  G,  I  |=  (G,  cr,  ws)  di'H;w;q  (D,  S). 

Proof:  By  co-induction. 

The  ENV  rule. 

Lemma  43  (ENV).  If  bJc)?],  c  is  silent  and  Locality(c),  then  Emp,  Emp,emp  {p}c{q}. 
Proof:  We  want  to  prove:  for  all  a,  w,  D  and  E,  if  (ct,  u;,D,  E)  \=  p,  then 

Emp,  Emp,  emp  ^  (c,  cr,  (0,  |c|))  ^height(c);»;g  (D,  E). 


D"  and 


E. 

(5.173) 

□ 
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We  know  |c|  =  1  and  can  prove  height(c)  =  1. 

By  co-induction.  We  know  (a,  E)  |=  emp  *  true.  From  [p]c[9],  we  know: 

(c,  tj)  abort , 

By  Locality(c),  we  know:  for  any  ap, 

(c,  a  &  ap)  *  abort ,  (c,  a  &  ap) ■ 

1.  for  any  ap,  Sp,  C  and  cr",  if  (c,(t  — >■  (C'^a"), 

by  the  operational  semantics,  we  know  C  =  skip. 

By  Locality(c),  we  know:  there  exists  a'  such  that  a"  =  a’  and  (c,  a)  — >■  (skip,  a'). 
From  ^sL  [p]c[9],  we  know: 

((t',w,D,  E)  1=  q 

By  the  skip  rule,  we  know: 

Emp,  Emp,  emp  (skip,  cr',  (0, 0))  dii-w-q  (D,  E) 

We  know  ((cr,  E),  (cr',  E),  false)  \=  Emp^  *  True. 

Also,  we  know:  (0,0)  <i  (0,1). 

2.  for  any  cr'  and  E',  if  ((cr,  E),  (ct',  E'),  true)  |=  Emp^  *  Id,  we  know  a'  —  a  and  E'  =  E. 

By  the  co-induction  hypothesis,  we  know:  Emp,  Emp,  emp  ^  (c,  cr',  (0, 1))  (D,  E'). 

3.  for  any  cr'  and  E',  if  ((cr,  E),  (cr',  E'),  false)  ^  Emp^  *  Id,  we  know  cr'  =  cr  and  E'  =  E. 

By  the  co-induction  hypothesis,  we  know:  Emp,  Emp,  emp  ^  (c,  cr',  (0, 1))  (D,  E'). 

Thus  we  are  done. 

The  FRAME  rule. 

Lemma  44  (FRAME).  If 

1.  R,G,I^{p}C{q}; 

2.  Sta({p,  g},  R  *  Id);  Sta(p',  (i?')'*’  *  Id);  I  >  {R,  G};  /'  >  {i?',  G"};  p\/  q  ^  I  *  true;  p'  ^  /' 
G+  ^  G; 

then  R  *  ii',  G  *  G',  1*1'  \=  {p*  p'}G{q  *  p'}. 

Proof:  We  want  to  prove:  for  all  cr,  w,  D  and  E,  if  (cr,  ic,  D,  E)  \=  p  *  p' ,  then 

R*  R',G  *G',I  *  r  ^  (G,  cr,  (0,  |G|))  ^height(C);u;;5*p'  (D,  E). 

Since  (cr,  u>,D,  E)  \=p*p\  we  know:  there  exist  cji,  cr2,  wi,  W2,  Di,  ID)2,  Ei  and  E2  such  that 

(cri,rt;i,Di,  El)  ^  p,  (cr2,  W2,  D2,  E2)  ^  p',  a  =  aiWa2,  w  =  'Wi+W2,  ID)  =  Dil±)D2,  E  =  E] 

From  the  premise,  we  know:  R,  G,  I  \=  (G,  cti,  (0,  |G|))  ^height(C);u;i;9  (Di,  Ei). 

By  Lemma  |45l  we  are  done. 

Lemma  45.  If 

1.  R,G,I  \=  (G,  (Ti,  ws)  di-Hiwpq  (Di,  El); 


(5.174) 

(5.175) 

(5.176) 

(5.177) 

□ 

*  true; 

l±)  E2 

□ 
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2.  Sta{q,  R*  Id);  Sta(p',  {R')^  *  Id);  /  >  {i?,  G};  I'  >  {R' ,  G'};  q  I  *  true;  p'  ^  I'  *  true;  G+  ^  G; 

3.  ((72,  W2,D2,  S2)  1=  p']  CT  =  (Ti  l±)  (72 ;  D  =  Di  i±)D2;  S  =  El  l±)  E2; 
then  R*  R',G*G',I*  r  ^  (G,  a,  ws)  di'H-,wi+w2-,q*p'  (D,  S). 

Proof:  By  co-induction.  From  the  premises,  we  know:  ((7i,Si)  \=  I  *  true  and  ((72,  E2)  |=  I'  *  true. 
Thus  we  know:  {a,  E)  \=  I  *  I'  *  true. 

1.  for  any  ap,  G'  and  cr",  if  (G,  ctWctf)  — ^  (G'jCr"), 

from  the  first  premise,  we  know:  there  exists  <j'-^  such  that  a"  =  l±)  (72  W  of-,  and  one  of  the 

following  holds: 

(a)  there  exist  ws' ,  w'-^,  and  E^  such  that  (Di,  Ei  l±)  E2  W  E^^^)  — >•+  (C^,  E'^  l±)  E2  W  E/t-), 

(((7i,Ei),((7;,E;),true)  |=  G+  *  True  and  i?,  G,  /  ^  (C”',  ws')  (Ci,  E'J. 

Since  ((72,  E2)  ^  *  true  and  /'  >  G',  we  know: 

(((72,  E2),  ((72,  E2),  true)  1=  G'  *  True. 

Since  G+  ^  G,  we  know: 

(((7i  l±)  (72,  El  l±)  E2),  ((7(  l±)  CT2,  E(  l±)  E2),true)  |=  (G  *  G')^  *  True. 

Since  D  =  Di  l±)  D2,  we  know  D2  =  •  and  D  =  Di.  Let  D'  =  C(  l±)  D2  =  C'j^. 

By  the  co-induction  hypothesis,  we  know 

i?  *  i?',  G  *  G',  /  *  r  1=  (G',  (7i  W  (72,  ws')  <n-,w',+n,2-,q*p'  (D',  S'l  W  E2). 

(b)  there  exists  ws'  such  that  ws'  <fi  ws, 

(((7i,Ei),((7(,Ei),false)  |=  G+  *  True  and  R,G,I  \=  {C ,a'■^,ws')^-H■^o^■,q{^l,^l)■ 

Smce  ((72,  E2)  \=  I'  *  true  and  I'  >G' ,  we  know: 

(((72,  E2),  ((72,  E2),  false)  ^  G'  *  True. 

Since  G+  ^  G,  we  know: 

(((7i  l±)  (72,  El  l±)  E2),  (a'l  l±)  (72,  El  l±)  E2),  false)  ^  (G  *  G')'*’  *  True. 

By  the  co-induction  hypothesis,  we  know 

R*  R'  ,G  *  G' ,  I  *  I'  ^  (G',  a'l  l±)  (72,  ws')  diH-,wi+w2-,q*p'  (D,  Ei  l±)  E2). 

2.  for  any  ap,  ^f,  e,  G'  and  a" ,  if  (G,  cr  l±)  ap)  — ^  (G',  a"),  the  proof  is  similar  to  the  previous  case. 

3.  for  any  a'  and  E',  if  ((cr,  E),  (cr',  E'),  true)  ^  {R*R')'^  *  Id, 

since  1 1>  R,  I'  >  R' ,  (cti,  Ei)  \=  I  *  true  and  (ct2,  E2)  \=  I'  *  true,  we  know:  there  exist  cr^,  cr^,  T,'^ 
and  E2  such  that  cr'  =  cr'^  l±)  cr^,  E'  =  E'^  l±)  E2, 

((cti,Ei),  (CT(,E'i),true)  |=  i?+  *  Id,  (((72,  E2),  (cr^,  E^),true)  ^  {R')~^  *  Id 

From  the  first  premise,  we  know  there  exist  ws'  and  w'l  such  that 

R,  G,  I  1=  (G,  a'„ws')  <n-,n,'Fq  (Di,  E'^). 

Since  ((72,  W2,  D2,  E2)  \=  p'  and  Sta(p',  (i?')’*’  *  Id),  we  know:  there  exists  w'2  such  that 

(t72,W2,D2,E'2)  '^p'. 

By  the  co-induction  hypothesis,  we  know: 
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R*R\G*G',I*  r  1=  (G,  a',  ws')  (D,  E'). 

4.  for  any  a'  and  S',  if  ((cr,  S),  (a' ,  S'),  false)  ^  (i?  *  R')'^  *  Id, 

since  1 1>  R,  I'  t>  R' ,  (cti.  Si)  \=  I  true  and  (ct2,  S2)  ^  *  true,  we  know:  there  exist  a'l,  cr^)  5]']^ 

and  S2  such  that  cr'  =  cr^  tt)  tr^ ,  S'  =  S'^  l±)  S2, 

((cri,Si),(CTi,S'i),false)  |=  i?+  *  Id,  ((cr2,  S2),  (cr^,  S^),  false)  ^  (i?')+  *  Id 

From  the  first  premise,  we  know 

i?,  G,  J  h  {C,  ai ,  ws)  (Di,  S'l). 

Since  (0-2,  W2,  D2,  S2)  \=  p'  and  Sta(p',  (i?')'*’  *  Id),  we  know: 

(cr^,-u;2,D2,Sy  ^p'. 

By  the  co-induction  hypothesis,  we  know: 

R*  R' ,G  *  G' ,  I  *  I'  ^  (G,  cr',  ws)  din-,wi+w2\q*p'  (D,  S'). 

5.  if  G  =  skip,  then  for  any  Tip,  from  the  first  premise  we  know  one  of  the  following  holds: 

(a)  there  exist  w'l,  C'^  and  S'^  such  that  (Di,  Si  l±)  S2  l±)  S^:’)  — >•+  (C'l,  S'l  l±)  S2  l±)  S^), 

((cti.  Si),  (cti,  S'l),  true)  |=  G+  *  True  and  (cri,  w'l, C'l,  S'l)  ^  q. 

Since  (172,  S2)  |=  /'  *  true  and  I'  >G' ,  we  know: 

(((72,  S2),  ((72,  S2),  true)  ^  G'  *  True. 

Since  G+  ^  G,  we  know: 

((cTi  l±)  (72,  Si  l±)  S2),  (cTi  l±)  CT2,  S'^  l±)  S2),true)  |=  (G  *  G')^  *  True. 

Since  D  =  Di  l±)  D2,  we  know  D2  =  •  and  D  =  Di.  Thus  C'l  l±)  D2  =  C'l. 

Since  (cri,  rci,  C'l,  S'l)  ^  q,  we  get: 

(cr,  w'l  +  W2,  C'l  l±)  D2,  S'l  l±)  S2)  \=  q*  p'. 

(b)  there  exists  w[  such  that  ws  =  (w'i,0)  and  (cri,rt;i  -I- iCi,  Di,  Si)  \=  q. 

Since  (cr2,  rc2,  D2,  C2)  \=  p' ,  we  have 

(cr,  wi  +W2  -I- r(;'i,D,  S)  \=  q  *  p' . 

6.  for  any  ap  and  Tp,  if  (G,  a  l±)  ap)  — )■  abort, 

from  the  first  premise,  we  know:  (Di,  Si  l±)  S2  tt)  Si?)  — S-  +  abort.  Thus  D2  =  •  and  D  =  Di.  Thus 
(D,  E  l±)  Ei?)  — )■+  abort. 

Thus  we  are  done.  □ 
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The  FR-CONJ  rule. 

Lemma  46  (FR-CONJ).  If 

1.  R,G,I^{p}C{q}; 

2.  Sta({p,  q},  R*  Id);  Sta{p' ,  *  Id);  Sta(p',  G  *  True);  I  >  {R,  G};  pV  q  ^  I  *  true; 

then  R,  G,  /  |=  {p  ®  p'}G{q  ®  p'}. 

Proof:  We  want  to  prove:  for  all  a,  w,  D  and  E,  if  (cr,  ?r,D,  E)  \=p®p',  then 

R,G,I  \=  (G,  cr,  (0,  |G|))  ^height(C);u;;9®p'  ^1). 

Since  (ct,  w,]D),  E)  \=  p®p',  we  know:  there  exist  wi,  W2,  Di  and  D2  such  that 

(cr,  ICl,  Di,  E)  ^  p,  (cr,  W2,]D)2,E)  ^p',  W  =  Wi+W2,  D  =  DiI±)D2 
From  the  premise,  we  know:  R,  G,  I  |=  (G,  cr,  (0,  |G|))  ^height(C);mi;9  (Di,  E). 

By  Lemma  |T7l  we  are  done.  □ 

Lemma  47.  If 

1.  R,  G,/ ^  (G,cr,  wsi)^«;^i;q(Di,E); 

2.  Sta{q,  R  *  Id);  Sta(p',  i?+  *  Id);  Sta(p',  G  *  True);  I  >  {R,  G};  q^  I  *  true; 

3.  {a,  W2j  ID)2,  E)  \=  p' w  =  wi  +  W2]  D  =  Di  l±)  D2; 
then  R,  G,  I  \=  (G,  cr,  wsi)  din-,w-qiisp’  (D,  E). 

Proof:  By  co-induction.  From  the  premises,  we  know:  (cr,  E)  ^  J  *  true. 

1.  for  any  ap,  E_f,  G'  and  cr",  if  (G,  ct  I±)  ap)  — (G',ct"), 

from  the  first  premise,  we  know:  there  exists  cr'  such  that  a"  =  ct'  l±)  crp’,  and  one  of  the  following 
holds: 

(a)  there  exist  ws'i,  C'^  and  E'  such  that  (Di,  E  tt)  E^)  — >•+  (C'l,  E'  l±)  E^), 

((CT,E),(CT',E'),true)  ^  G+  *  True  and  R,G,/  |=  {C ,ws\)din-,wpq{C-\,T.'). 

Since  Sta(p',  G  *  True),  we  know 

Sta(p',  G’*'  *  True) 

Since  (ct,  W2,D2,E)  \=  p' ,  we  know  there  exists  W2  such  that 

(ct',u>2,D2,S')  \=p' 

Since  D  =  Di  l±)  ©2)  we  know  D2  =  •  and  D  =  Di.  Let  D'  =  l±)  D2  =  C)  and  w'  =  wi  +  W2- 

By  the  co-induction  hypothesis,  we  know 

R,  G,  J  h  (C',  ws[)  ^n-,w'-q&p'  (D',  E'). 

(b)  there  exists  ws'i  such  that  ws'i  <u  wsi, 

((cr,  E),  (cr',  E),  false)  |=  G+  *  True  and  R,  G,  /  ^  (G',  ct',  ws))  diH\wi-,q  (Di,  E). 

Since  (ct,  W2,D2,S)  \=  p'  and  Sta(p',  G  *  True),  we  know 

(ct',-u;2,D2,E)  ^p' 

By  the  co- induction  hypothesis,  we  know 

R,  G,  /  ^  (G',  cr',  ws'i)  diH-,w,q(Bp'  (D,  E). 
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2.  for  any  ap,  T,f,  e,  C'  and  a",  if  (C,  cr  l±)  ap)  — ^  (C",  a"),  the  proof  is  similar  to  the  previous  case. 

3.  for  any  a'  and  S',  if  ((cr,  S),  (cr',  S'),  true)  \=  *  Id, 

from  the  first  premise,  we  know  there  exists  ws'i  such  that 

i?,  G,  /  h  (C,  a',  «;s'i)  (Di,  S'). 

Since  (cr,  W2,  D2,  S)  \=  p'  and  Sta(p',  *  Id),  we  know:  there  exists  such  that 

(cr',w^,D2,S')  \=p'. 

By  the  co- induction  hypothesis,  we  know:  let  w'  =  wi  +  W2, 

R,G,I  \=  (G,  cr',  ws[)  (D,  S'). 

4.  for  any  cr'  and  S',  if  ((cr,  S),  (cr',  S'),  false)  ^  *  Id, 

from  the  first  premise,  we  know 

i?,  G,  /  h  (C,  a',  wsi)  (Di,  S'). 

Since  (cr,  W2,D2,S)  \=  p'  and  Sta(p',i?+  *  Id),  we  know: 

(cr',  W2,D2,  s')  ^  p'. 

By  the  co-induction  hypothesis,  we  know: 

i?,  G,  /  ^  (G,  cr',  WSi)  (D,  S'). 

5.  if  G  =  skip,  then  for  any  T,p,  from  the  first  premise  we  know  one  of  the  following  holds: 

(a)  there  exist  w[,  C'^  and  S'  such  that  (Di,  S  l±)  Sp)  — >•+  (C'^,  S'  l±)  S/r), 

((cr,  S),  (cr,  S'),  true)  ^G+*True  and  (cr,  C'^,  S')  \=  q. 

Since  (cr,  ^2,  D2,  S)  \=  p'  and  Sta(p',  G  *  True),  we  know  there  exists  W2  such  that 

(ct,w'2,D2,S')  \=p' 

Since  D  =  Di  l±)  D2,  we  know  D2  =  •  and  D  =  Di.  Thus  l±)  D2  =  C'j^.  Thus  we  get: 

(cr,  w[  +  W2,C'i  l±)  D2,  S')  ^  g  ®  p'. 

(b)  there  exists  w'l  such  that  wsi  =  (w'i,0)  and  (cr,  wi  +  w(,Di,  S)  \=  q. 

Since  (cr,  ^2,  D2,  S)  \=  p' ,  we  have 

(cr,  W1+W2  +  w'i,D,  S)  ^  g  ®  p'. 

6.  for  any  ap  and  Sp,  if  (G,  cr  l±)  ap)  — s-  abort, 

from  the  first  premise,  we  know:  (ID)i,S  l±)  Sp)  — +  abort.  Thus  D2  =  •  and  D  =  Di.  Thus 
(D,  S  l±)  Si?)  — S-+  abort. 

Thus  we  are  done.  □ 


431 


5.5  Derivation  of  WHILE-TERM  Rule 

Lemma  48  (WHILE-TERM  Derivable).  If 

1.  R,G,iy~  {pAB  A{E  =  a)}C{p  A  {E  <  a)}; 

2.  p  A  B  ^  E  >  0] 

3.  p^  {{B  =  B)A{E  =  E))  *  J; 

4.  G+  ^  G; 

5.  a  is  a  fresh  logical  variable; 

then  R,  G,/ I"  {bJwjwhile  {B)  G{bJwA^i?}. 

Proof:  Take  a  fresh  logical  variable  /3  and  by  applying  the  CONSEQ  rule  to  the  premise  1,  we  get: 

R,  G,  /  h  {3/3.  pA{E  = /3)  AB  A{E  =  a)}C{3p.  p  A  (E  =  l3)  A  {E  <  a)}  (5.178) 


From  p  A  B  ^  E  >  0,  we  know 

p  A  B  A  {E  =  a)  ^  Q!>0  (5.179) 

Since  G"*"  =>  G,  Sta(wf(Q;)  A  emp,  Emp  *  Id),  emp  >  Emp  and  (wf(Q;)  A  emp)  emp  *  true,  we  can  apply 
the  FRAME  rule  to  (5.178)  and  get 

i?,  G,  /  h  1(3/3.  pA{E  =  /3)ABA{E  =  o))  *  (wf(a)  Aemp)}G{(3/3.  pA{E  =  (3)  A  {E  <  a))*  (wf(a;)  Aemp)} 

(5.180) 

We  reduce  (5.180)  as  follows: 

i?,G,/  h  (3/3.  {pA{E  =  /3))*(wf(a)Aemp)Ai3A(£'  =  a)}G{3/3.  {pA{E  =  /3))  *  (wf(a;)  Aemp)  A  (E  <  a)} 

(5.181) 

i?,  G,  /  h  (3/3.  {pA{E  =  /3))*(wf(/3)Aemp))Ai3A(E  =  a;)}G{3/3.  {pA{E  =  ,0))*(wf(/3+l)Aemp))A(E  <  a)} 

(5.182) 

Since  (wf(,0  +  1)  A  emp)  ^  (wf(/3)  A  emp)  *  (wf(l)  A  emp),  we  let 


Po  =  {3p.  {pA{E  =  /3))  *  (wf(/3)  A  emp)) 


then  (5.182)  can  be  written  as: 

R,G,I  \-  {po  A  B  A  {E  =  a)}G{{po  *  (wf  (1)  A  emp))  A  {E  <  a)} 

By  the  exists  rule  and  a  is  not  free  in  R,  G  and  /,  we  get: 

R,G,I  \-  {3a.  pq  A  B  A  {E  =  a)}G{3a.  (po  *  (wf  (1)  A  emp))  A  {E  <  a)} 
Since  a  is  not  free  in  p,  B  and  E,  we  know 

(po  A  B)  ^  (3a.  Po  A  E  A  (E  =  a)) 

and 

(3a.  (po  *  (wf(l)  A  emp))  A  (E  <  a))  ^  (po  *  (wf(l)  A  emp)) 

Thus  by  applying  CONSEQ  rule  to  (|5.185),  we  get: 

i?,  G,  /  h  {po  A  E}G{po  *  (wf  (1)  A  emp)} 


(5.183) 

(5.184) 

(5.185) 

(5.186) 

(5.187) 

(5.188) 


From  p  ^  {B  =  B)  *  I  and  po  *  (wf(l)  A  emp)  A  E  =>  (po  A  E)  *  (wf(l)  A  emp),  by  applying  the  while 
rule  and  the  hide-w  rule,  we  get: 


E,  G,/  h  {[po  *  (wf(l)  A  emp)Jw}while  (E)  G{[po  *  (wf(l)  A  emp)Jw  A  ^E} 


(5.189) 
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It  can  be  reduced  to: 


R,G,I  \-  {3/3.  A{E  =  /3)}while  {B)  C{3/3.  [pjw  A  {E  =  (3)  A  -^B} 
Since  p  ^  {E  =  E)  *  I,  we  know 

i?,  G,/ h  {[pjw} while  (B)  C{lp\^  A^B} 


(5.190) 


(5.191) 


Thus  we  are  done. 


□ 
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Abstract 

Recent  ground-breaking  efforts  such  as  CompCert  have  made  a 
convincing  case  that  mechanized  verification  of  the  compiler  cor¬ 
rectness  for  realistic  C  programs  is  both  viable  and  practical. 
Unfortunately,  existing  verified  compilers  can  only  handle  whole 
programs — this  severely  limits  their  applicability  and  prevents  the 
linking  of  verified  C  programs  with  verified  external  libraries.  In 
this  paper,  we  present  a  novel  compositional  semantics  for  reason¬ 
ing  about  open  modules  and  for  supporting  verified  separate  compi¬ 
lation  and  linking.  More  specifically,  we  replace  external  function 
calls  with  explicit  events  in  the  behavioral  semantics.  We  then  de¬ 
velop  a  verified  linking  operator  that  makes  lazy  substitutions  on 
(potentially  reacting)  behaviors  by  replacing  each  external  func¬ 
tion  call  event  with  a  behavior  simulating  the  requested  function. 
Finally,  we  show  how  our  new  semantics  can  be  applied  to  build 
a  refinement  infrastructure  that  supports  both  vertical  composition 
and  horizontal  composition. 

Categories  and  Subject  Descriptors  F.3. 1  [Logics  and  Meanings 
of  Programs]:  Specifying  and  Verifying  and  Reasoning  about  Pro¬ 
grams;  D.3.4  [Programming Languages]:  Processors — Compilers; 
D.2.4  [Software  Engineering]:  Software/Program  Verification — 
Correctness  proofs,  formal  methods 

Keywords  Compositional  Semantics;  Vertical  Composition;  Hor¬ 
izontal  Composition;  Verified  Compilation  and  Linking. 

1.  Introduction 

Compiler  verification  has  long  been  considered  as  a  theoretically 
deep  and  practically  important  research  subject.  It  addresses  the 
very  question  of  program  equivalence  (or  simulation),  a  primary 
reason  that  we  need  to  define  formal  semantics  for  programming 
languages.  It  is  important  for  practical  software  developers  since 
compiler  bugs  can  lead  to  the  silent  generation  of  incorrect  pro¬ 
grams,  which  could  lead  to  unexpected  crashes  and  security  holes. 

Recent  work  on  CompCert  [12,  11]  has  shown  that  mechanized 
verification  of  the  compiler  correctness  for  C  is  both  viable  and 
practical,  and  the  resulting  compiler  is  indeed  empirically  much 
more  reliable  than  traditional  (unverified)  ones  [22].  The  success  of 

Permission  to  make  digital  or  hard  copies  of  all  or  part  of  this  work  for  personal  or 
classroom  use  is  granted  without  fee  provided  that  copies  are  not  made  or  distributed 
for  profit  or  commercial  advantage  and  that  copies  bear  this  notice  and  the  full  citation 
on  the  first  page.  Copyrights  for  components  of  this  work  owned  by  others  than  the 
author(s)  must  be  honored.  Abstracting  with  credit  is  permitted.  To  copy  otherwise,  or 
republish,  to  post  on  servers  or  to  redistribute  to  lists,  requires  prior  specific  permission 
and/or  a  fee.  Request  permissions  from  permissions@acm.org. 

CPP  ’15,  January  13-14,  2015,  Mumbai,  India. 

Copyright  is  held  by  the  owner/author(s).  Publication  rights  licensed  to  ACM. 

ACM  9784-4503-3296-5/15/01. . .  $15.00. 
http://dx.doi.org/10.1145/2676724.2693167 


CompCert  can  be  partly  attributed  to  its  uses  of  simple  (small-step 
and/or  big-step)  operational  semantics  [14],  a  shared  behavioral 
specification  language  (capable  of  describing  terminating,  stuck, 
silently  diverging,  and  reacting  behaviors),  and  a  unified  C  mem¬ 
ory  model  [13]  for  all  of  its  compiler  intermediate  languages.  The 
simplicity  of  the  CompCert  semantics  made  it  possible  and  prac¬ 
tical  to  mechanically  verify  the  correctness  of  many  compilation 
phases  under  a  reasonable  amount  of  effort. 

One  important  weakness  of  CompCert  is  that  it  can  only  handle 
whole  programs.  This  severely  limits  its  applicability.  A  computer 
program  is  often  not  just  a  single  piece  of  code  written  and  com¬ 
piled  at  once,  but  is  instead  obtained  by  compiling  and  linking  dif¬ 
ferent  modules,  or  compilation  units,  that  can  be  originally  written 
in  different  programming  languages,  independently  of  each  other. 
From  the  compilation  point  of  view,  the  final  program  is  obtained 
by  linking  different  object  files,  each  of  which  is  either  written  di¬ 
rectly  or  obtained  by  compiling  a  source  compilation  unit.  Different 
compilers  can  be  used  for  different  modules. 

From  the  program-verification  point  of  view,  a  computer  pro¬ 
gram  is  almost  never  verified  as  a  whole,  but  for  each  compilation 
unit,  its  source  code  (or  object  file,  if  written  directly)  is  verified  in¬ 
dependently  from  the  implementation  of  the  other  modules.  With¬ 
out  support  for  separate  compilation  and  linking,  verified  C  pro¬ 
grams,  even  if  correctly  compiled  by  CompCert,  cannot  be  linked 
with  verified  external  libraries. 

An  open  problem  for  supporting  verified  separate  compilation 
and  linking  is  to  find  a  simple  compositional  semantics  for  open 
modules  and  to  specify  and  reason  about  such  semantic  behaviors 
in  a  language-independent  way.  Following  Hur  et  al  [9,  10],  we 
want  to  achieve  compositionality  in  the  two  dimensions: 

•  vertical  composition  corresponds  to  successive  compilation 
passes  on  a  given  compilation  unit.  Each  compilation  pass  can 
be  an  optimization  to  make  a  program  more  efficient  while 
staying  at  the  same  representation  level,  or  a  compilation  phase 
from  one  intermediate  representation  to  another:  how  to  de¬ 
fine  compositional  semantics  of  intermediate  programs  in  a 
language-independent  format  so  that  we  can  show  that  each 
compilation  pass  does  not  introduce  unwanted  behaviors? 

•  horizontal  composition  corresponds  to  the  linking  of  different 
modules  at  the  same  level  (i.e.  at  the  level  of  object  files,  or 
at  the  same  intermediate  level).  It  corresponds  to  the  notion  of 
program  composition:  local  reasoning  shall  allow  studying  the 
behavior  of  program  components  when  placed  in  an  abstractly 
specified  context.  But  conversely,  when  linking  them  together, 
compilation  units  will  play  the  role  of  contexts  for  other  mod¬ 
ules.  More  generally,  this  notion  becomes  symmetric  when  they 
can  mutually  call  functions  in  each  other. 

In  this  paper,  we  present  a  novel  compositional  semantics  (for 
open  modules)  that  supports  both  vertical  composition  and  hori- 
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zontal  composition  for  C-like  languages.  Traditionally,  operational 
semantics  focuses  on  reasoning  about  the  behaviors  of  a  whole  pro¬ 
gram.  This  partly  explains  why  CompCert  does  not  handle  open 
modules.  A  significant  attempt  toward  developing  compositional 
semantics  has  been  denotational  semantics,  and  the  underlying  do¬ 
main  theory  has  led  to  a  wide  body  of  research;  however,  denota¬ 
tional  models  become  difficult  to  extend  as  we  add  more  language 
features  and  they  are  harder  to  mechanize  in  a  proof  assistant. 

Our  paper  makes  the  following  contributions: 

•  We  develop  a  compositional  semantics  (denoted  as  J-Jcomp^  see 
Sec.  4)  to  help  reason  about  open  modules.  Our  key  idea  is  to 
model  external  function  calls  in  a  similar  way  as  how  composi¬ 
tional  semantics  for  concurrent  languages  [4]  models  environ¬ 
mental  transitions.  The  behavior  of  a  call  to  an  external  func¬ 
tion  /  is  modeled  as  an  event  Extcall(/,ra,m'),  with  m  and 
m'  denoting  memory  states  before  and  after  the  call.  A  function 
body  that  makes  n  consecutive  external  calls  can  be  modeled  as 
a  sequence  of  event  traces  of  the  form  Extcall(/i,mi,m'|)  :: 
Extcall(/2, ra2, nij)  ::  ...  ::  Extcall(/„, m„, m(,),  with  the 
assumption  that  segments  between  two  external  call  events,  e.g., 

and  j,m„),  are  transitions  made  by  the  function 
body  itself.  We  show  how  to  extend  the  CompCert-style  behav¬ 
ioral  semantics  with  these  new  external  call  events  and  how  to 
use  a  shared  behavioral  specification  language  (as  in  CompCert) 
to  support  vertical  compositionality. 

•  We  develop  a  linking  operator  directly  at  the  semantic  level  (de¬ 
noted  as  X,  see  Sec.  5),  based  on  a  resolution  operator  which 
makes  a  lazy  substitution  on  behaviors  by  replacing  each  exter¬ 
nal  function  call  event  with  a  behavior  simulating  the  requested 
function.  We  show  that  applying  the  linking  operator  to  the 
compositional  semantic  objects  (i/^i  and  1^2  for  open  modules 
Ui  and  U2)  will  yield  the  same  compositional  semantic  object 
(i/ri  X  1/^2)  for  the  linked  module  (ui  i+l  U2).  Since  linking  is 
directly  done  on  semantic  objects,  our  approach  can  also  be  ap¬ 
plied  to  components  compiled  from  different  source  languages; 
for  example,  a  module  (in  language  A)  can  be  compiled  by 
compiler  Ca  and  linked  with  another  module  Up  compiled  by 
compiler  Cp,  yielding  a  resulting  binary  with  the  semantic  ob¬ 
ject  |[C„(u„)]| 

comp  ^  lQ(u^)I  comp* 

•  Thanks  to  this  new  compositional  semantics  and  semantic  link¬ 
ing,  we  develop  a  refinement  infrastructure  (denoted  as  E,  see 
Sec.  6)  that  unifies  program  verification  and  verified  separate 
compilation:  each  verification  step,  as  well  as  each  compilation 
step,  is  actually  a  refinement  step.  The  transitivity  property  of 
our  refinement  relation  implies  vertical  composition;  and  the 
congruence  property  (a.k.a.  monotonicity,  see  Theorem  2)  im¬ 
plies  horizontal  composition. 

•  Unlike  the  CompCert  whole  program  semantics,  which  does  not 
expose  memory  states  in  its  event  traces,  compositional  seman¬ 
tics  for  open  modules  may  make  part  of  the  memory  state  ob¬ 
servable  (e.g.,  as  in  an  external  call  event  Extcall(/,  m,  m')). 
This  creates  challenges  for  verifying  compilation  phases  that 
alter  memory  states.  We  introduce  a-refinement  (denoted  as 
Eff,  see  Sec.  7),  a  generalization  of  E  with  a  bijection  a  be¬ 
tween  the  source  and  the  target  memory  states.  We  show  how  a- 
refinement  can  be  used  to  verify  the  correctness  of  the  memory¬ 
changing  phases  in  CompCert,  and  we  have  successfully  reim¬ 
plemented  (and  verified  in  Coq)  the  Clight-to-Cminor  phase — 
the  CompCert  pass  that  uses  the  most  sophisticated  memory 
injection  relation — using  a-like  memory  bijection. 


stantiation  of  the  framework  for  the  common  subexpression  elim¬ 
ination  pass,  and  a  new  implementation  of  the  CompCert  memory 
model  with  block  tags  and  the  Clight-to-Cminor  compilation  phase 
using  memory  bijection. 


2.  Preliminary:  small-step  and  big-step  semantics 

In  this  section,  we  define  the  general  notion  of  small-step  seman¬ 
tics,  or  transition  systems,  and  explain  how  to  automatically  con¬ 
struct  big-step  semantics  based  on  them.  Throughout  the  paper, 
when  we  define  a  small-step  semantics,  we  always  construct  the 
corresponding  big-step  semantics  based  on  this  section. 

Small-step  semantics  illustrates  how  to  execute  programs  with 
minimal  steps.  Big-step  semantics  gives  us  the  meaning  of  pro¬ 
grams  as  a  whole.  When  studying  the  meaning  of  a  program,  we 
focus  not  only  on  whether  it  terminates  or  diverges,  but  also  on  its 
interaction  with  the  outside  environment  through  events  like  input 
and  output,  network  communications,  etc.  We  borrow  all  these  def¬ 
initions  from  the  CompCert  verified  compiler  [11]. 

Before  diving  into  semantics,  we  first  go  through  some  notations 
on  sets,  (finite)  lists,  and  (infinite)  streams.  We  use  A'  to  denote  the 
set  of  all  subsets  of  X  with  0  or  1  element.  For  any  subset  TEA*, 
we  liberally  write  x  e  Y  instead  of  [x]  e  Y.  The  standard  notation 
for  power  set  'P(A)  is  also  used. 

For  any  set  A,  A*  denotes  the  set  of  finite  lists  of  elements  of 
A.  Such  lists  can  be  either  empty  (e)  or  nonempty  (x  ::  1).  For  two 
lists  /i,/2  6  A*,  li  -H*  I2  is  their  concatenation.  A”  denotes  the  set 
of  infinite  streams  of  A,  which  are  defined  coinductively  such  that 
all  elements  are  of  the  form  x  ::::  I  where  x  6  A  and  I  e  A". 
The  coinductive  definition  allows  (actually,  requires)  streams  to  be 
infinite,  in  contrast  to  lists,  which  are  defined  inductively  and  must 
be  finite.  Prepending  a  list  1  of  A  in  front  of  a  stream  I  of  A  is 
written  /  1.  We  write  ~  I2  meaning  two  streams  are  bisimilar 

(coinductively,  3x,  Vi  =  1, 2, 31'.,  I,-  =  x  ::::  Y.  and  Ij  ~  Ij). 

Definition  1  (Small-step  semantics).  A  small-step  semantics  (or  a 
transition  system)  is  a  tuple  S  =  (fi,  S,  — 7?,  f)  where: 

•  £  is  the  set  of  events. 

•  .S  is  the  set  of  configurations  (or  states). 

•  (— >)  Q  S  X  X  S  is  the  transition  relation,  usually  written  in 
infix  forms  s^s'  and  s^s'.  We  say  that  s  makes  one  step  (or 
transition)  to  s',  producing  an  event  e  (if  any).  A  step  producing 
no  event  is  silent. 

•  7?  is  the  set  of  results. 

•  Q  (S  xK)  is  a  relation  associating  final  states  with  results. 
A  configuration  s  is  said  to  be  final  with  result  r  if,  and  only  if, 
(j,  r)  6  T- 


The  transition  relation  may  be  nondeterministic:  for  a  given 
configuration  j,  there  can  be  several  possible  configurations  s'  such 
that  s^s'  (or  j— >5'  for  some  event  e). 

Then,  a  configuration  s  can  make  several  transitions  to  .s'  pro¬ 
ducing  a  finite  list  cr  of  events  in  £,  which  we  write  j— >*  s'  (or  .s— s' 
if  there  is  at  least  one  step)  and  define  as  the  reflexive-transitive 
(resp.  transitive)  closure  of  the  transition  step  relation: 


<^1  ,  0-2  , 

5 - - >^5-2 


tri4fo-2 

J - >  S2 


£ 

s—^*s 


We  can  then  define  the  behavior  of  a  transition  system  from  an 
initial  state  so  6  S. 


All  our  proofs  have  been  carried  out  in  Coq  [20]  and  can  be 
found  at  the  companion  web  site  [18].  The  implementation  includes 
the  generic  compositional  semantics  and  linking  framework,  an  in¬ 


•  It  can  perform  finitely  many  transition  steps  to  some  final  con¬ 
figuration  s'  such  that  (j',  F)  6  f'  for  some  F.  In  this  case,  we 
say  that  it  is  a  terminating  behavior.  For  such  a  behavior,  we 
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record  the  result  r'  and  its  trace  of  events,  a  finite  list,  produced 
to  go  from  Jo  to  j'- 

•  It  can  perform  finitely  many  transition  steps  to  some  non-final 
configuration  j'  but  from  which  no  step  is  possible.  In  this 
case,  we  say  that  it  is  a  going-wrong  (or  stuck)  behavior,  for 
which  we  record  the  trace  produced  from  jq  to  j'.  In  practice,  j' 
corresponds  to  a  configuration  requesting  an  invalid  operation 
such  as  out-of-bounds  array  access  or  division  by  zero. 

•  It  can  perform  infinitely  many  transition  steps.  For  such  cases, 
we  need  to  distinguish  whether  a  finite  or  infinite  list  of  events 
is  produced  during  these  transition  steps.  (1)  In  the  finite  event 
case,  finitely  many  steps  are  performed  to  some  state  j'  from 
which  infinitely  many  silent  transitions  are  performed.  (2)  In 
the  infinite  event  case,  starting  from  any  state,  a  non-silent 
transition  can  always  be  reached  within  finitely  many  steps;  we 
record  the  trace  as  an  infinite  stream  of  events. 


int  main  ()  {  printf ( ' a' ) ;  return  2;  } 

has  the  behavior  OUT(a)  ::  e  i(2). 

int  main  ()  {  printf('a');  3/0;  return  4;} 

has  the  behavior  OUT(a)  ::  ei . 

int  main  ()  {  printf  (' a' ) ;  while  (1)  {}  } 

has  the  behavior  OUT(a)  ::  b /'. 

int  main  ()  {  while  (1)  printf  (' b' ) ;  } 

has  the  behavior  OUT(b)  ::::  . . .  ::::  OUT(b)  ::::  . . .  .^ 

Lemma  1.  For  any  configuration  jq  6  <S,  the  transition  system  has 
at  least  one  behavior  from  Jq.'  (|SD(jo)  F-  0. 

Proof.  Done  in  CompCert  [11].  Requires  the  excluded  middle  to 
distinguish  whether  the  program  has  finite  or  infinite  sequence 
of  steps,  and  an  axiom  of  constructive  indefinite  description  to 
construct  the  infinite  event  sequence  in  the  reacting  case.  □ 


The  bullets  above  are  formally  defined  as  follows. 

Definition  2  (Behaviors).  Given  a  set  of  events  £  and  a  set  of 
results  K,  we  define  the  set  of  behaviors  S  as  follows: 


6  S 

Behavior 

::=  cr  l(r) 

(cr  6  £*,r  6  R) 

Terminating  behavior 

1  cri 

(cr  e  £*) 

Going-wrong  behavior 

1  fA/ 

(0-  6  £*) 

Diverging  behavior 

(finitely  many  events,  then  silently  diverges) 

1  ? 

(?  6  £”) 

Reacting  behavior 

(diverging  with  infinitely  many  events) 


The  concatenation  of  an  event  e  (resp.  of  an  event  list  cr)  and  a 
behavior  b  is  written  e-b  ( resp.  cr*  b)  and  defined  in  a  straightfor¬ 
ward  way: 


e  ■  (o-  i(r)) 
e-icri) 
e  ■  (cr/) 
e-{?  yfi^) 


(e  cr)  lir)  s*b  =  b 

(e  v.  afh  (e  ::  cr)  •  b  =  e  ■  (cr  •  b) 

(e  ::  cr) / 

(e  ::::  p)  gfi' 


Definition  3  (Stuck,  silently  diverging,  reacting  states).  Given  a 
small-step  semantics  S  =  (£,  .S,  — » ,  /?,  F'),  a  configuration  s  is: 


•  stuck  (written  si )  if,  and  only  if,  there  is  no  s'  (resp.  and  there 
is  no  e  e  S)  such  that  j— >j'  (resp.  j— >j'j 

•  silently  diverging  (written  s  /')  if,  and  only  if,  coinductively, 
there  is  a  configuration  s'  such  that  j— >  j'  and  s'  fi'. 

•  reacting  with  the  infinite  event  stream  p  (written  s  gF  g)  if,  and 
only  if,  coinductively,  there  is  a  nonempty  finite  event  list  cr  and 
a  configuration  s'  such  that  j— j',  and  an  infinite  event  stream 
p'  such  that  s'  ^  p'  and  p  ~  cr  p'. 


If  the  transition  relation  is  nondeterministic,  the  transition  sys¬ 
tem  may  have  several  behaviors  from  a  single  initial  state.  We  want 
to  describe  the  set  of  all  the  possible  behaviors  of  the  transition 
system  from  a  given  configuration. 


Definition  4  (Big-step  semantics).  Given  a  small-step  semantics 
S  =  (£,  S,  — "R,  T),  the  big-step  semantics  (|SD  o/  S  is  a  function 
from  S  to  VCB)  such  that,  for  each  configuration  Jq,  (|SD(jo)  is  the 
set  of  all  possible  behaviors  from  Sq,  defined  as  follows: 


^S^Jo)  = 
u 
u 
u 


[cr  J,(r)  :  Jo-^*  J  A  (j,  r)  e  T] 
[cri  :  Jo-^*J  A  si] 

[cr/':  so^'s  A  s /'] 


Below  are  some  C  examples  showing  the  use  of  behaviors.  The 
command  printf  ( '  a'  )  ; ,  printing  the  character  “a”  on  screen, 
produces  an  observable  event  OUT(a).  The  results  are  int  values. 


3.  Starting  point:  a  language  with  function  calls 

Our  work  studies  a  semantic  notion  of  linking  two  compilation 
units  at  the  level  of  their  behaviors,  independently  of  the  languages 
in  which  they  are  defined.  We  first  show  how  to  derive  a  set  of 
behaviors  for  an  open  module  from  a  language  with  function  calls. 

In  this  section,  we  first  describe  our  starting  point,  the  semantics 
of  a  language  with  function  calls.  For  now,  we  consider  only  whole 
programs.  Then  we  will  show  in  Sec.  4  how  to  make  its  semantics 
compositional  and  suitable  for  open  modules. 

Our  starting  point  language  makes  a  memory  state  evolve 
throughout  the  whole  program  execution  across  function  calls,  and 
a  local  state  (e.g.  local  variables)  evolve  within  each  function  call. 
When  a  function  returns,  we  consider  that  its  result  is  the  new  mem¬ 
ory  state  obtained  at  the  end  of  the  execution  of  the  function,  just 
before  it  hands  over  to  its  caller.  Our  Coq  development  also  features 
argument  passing  and  return  value,  but  for  the  sake  of  presentation, 
we  do  not  mention  them  here.  See  Sec.  6.5  for  more  details  about 
our  Coq  implementation. 

The  key  point  of  the  semantics  of  our  language  is  that  the  local 
state  of  a  function  call  cannot  be  changed  by  other  function  calls: 
when  a  function  is  called,  the  local  state  of  the  caller  is  “frozen” 
until  the  callee  returns. 

In  this  section,  we  consider  the  semantics  of  a  whole  program, 
which  does  not  contain  external  function  calls.  A  program  consists 
of  several  functions.  We  are  interested  in  the  behaviors  of  each 
function  in  the  program  for  all  memory  states  under  which  the 
function  is  called. 

A  program  is  modeled  as  a  partial  function  from  function  names 
to  code.  Let  p  be  a  program  and  /  be  a  function  name,  p(f)  is 
then  the  body  of  /.  When  /  is  called  under  a  memory  state  m,  an 
initial  local  state  lnit(p(/),m)  is  first  created  from  the  code  p(f). 
The  local  state  and  the  memory  state  evolve  together  by  performing 
local  transition  steps  which  can  produce  some  events.  Eventually, 
the  local  state  may  correspond  to  a  return  state,  meaning  that  the 
execution  of  the  function  has  reached  termination.  The  memory 
state  is  the  result  of  the  function  call. 

But  a  local  state  /  can  also  correspond  to  calling  some  other 
function  /':  in  that  case,  I  is  first  saved  into  a  continuation  frame 
k  =  Backup(m,  1)  that  is  put  on  top  of  a  continuation  stack,  then  the 
function  /'  is  called  and  run.  If  the  execution  of  this  callee  reaches 
a  return  state,  with  a  new  memory  state  m' ,  then  the  execution  goes 
back  to  the  caller  by  retrieving,  from  the  stack,  the  frame  k  and 
constructing  a  new  local  state  Restore(m',  fc). 

So,  to  obtain  the  behaviors  of  a  function  /  under  a  memory  state 
m,  we  just  have  to  big-step  such  a  small-step  semantics.  Note  that 
when  an  execution  reaches  a  configuration  where  the  local  state  is 
a  return  state  and  the  stack  is  empty,  it  means  that  the  function  is 
done  executing  and  there  is  no  caller  function  to  return  to,  hence  it 
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is  the  final  configuration  for  a  function  execution.  The  result  of  the 
execution  is  the  memory  state  of  such  a  configuration. 

Definition  5  (Language  with  function  calls).  A  language  with 
function  calls  is  a  tuple: 

fi  =  (F,  C,  MS,  LS,  Init,  Kind,  E,  — K,  Backup,  Restore) 

•  F  is  the  set  of  function  names. 

•  C  is  the  set  of  pieces  of  code  corresponding  to  the  bodies  of 
functions  (i.e.,  the  syntax  of  the  language). 

•  MS  is  the  set  of  memory  states. 

•  LS  is  the  set  of  local  states. 

•  Init  :  (C  X  MS)  — >  LS  is  a  total  function  that  gives  the  initial 
local  state  when  starting  to  execute  a  function  body. 

•  Kind  is  a  total  function  such  that,  for  any  local  state  I  e  LS, 
Kind(0  may  be  either: 

■  Call(/)  to  say  that  I  corresponds  to  calling  a  function  f  e 
F.  Then  we  define  LScaii  =  [I  '■  3/,  Kind(/)  =  Call(/)). 

■  Return  to  say  that  I  is  a  return  state. 

■  Normal.'  none  of  the  above.  Then,  we  define  LSNormai  - 
{/  :  Kind(/)  =  Normal). 

•  E  is  the  set  of  events. 

•  (— >)  c  ((MS  X  LSnormai)  X  E'  X  (MS  X  LS))  is  the  internal 
step  relation,  usually  written  in  infix  forms  {m,l)^{m'  ,1')  and 
(m,  l)^(m' ,  I'). 

•  K  is  the  set  of  continuation  stack  frames. 

•  Backup  :  (MS  X  LScaii)  — ^  ri?  a  total  function  that  saves 

the  current  local  state  into  a  stack  frame  upon  function  call. 

•  Restore  :  (MS  x  K)  — >  LS  is  a  total  function  that  restores  a 
new  local  state  from  a  stack  frame  upon  callee  return. 

Definition  6.  Let  &  be  a  language  with  function  calls.  A  program 
is  a  partial  function  from  function  names  to  code. 

Definition  7  (Procedural  semantics).  Let  &  be  a  language  with 
function  calls  and  p  be  a  program  in  fi,  the  procedural  small-step 
semantics  Proc[fi,/r]  is  defined  as  follows: 

•  The  set  of  events  is  E. 

•  The  set  of  configurations  is  MS  x  LS  x  K*. 

•  The  transition  relation  (fi,  /?)  l-  •  — >  •  is  defined  as  follows: 

Kind(/)  =  Normal  (m, /)— >(m',  1') 

(fi,  p)  I-  (m,  I,  /<•)— >(m',  1',k) 

Kind(/)  =  Normal  (m, /)— >(m', /')  eeE 
(fi,  p)  V  (m,  /,  K)^(m’ ,  1',k) 

Kind(/)  =  Call(/)  p(/)  =  c 
r  =  lnit(c,m)  k  =  Backup(m, /) 

(fi,  p)  h  (m,  /,  K)—^(m,  r,k  ::  k) 

Kind(/)  =  Return  /'  =  Restore(m,  fc) 

(fi,  p)  I-  (m,  l,k  ::  K)—^(m,  1',k) 

•  The  set  of  results  is  MS. 

•  The  final  configurations  with  result  m  are  the  configurations 
(m,l,s)  where  Kind(0  =  Return. 

Let  B  be  the  set  of  behaviors  on  events  E.  The  procedural  big- 
step  semantics  of  p  is  the  function  Jp]  :  dom(p)  — >  MS  — >  'R(B) 
obtained  from  big-stepping  the  procedural  small-step  semantics: 

M(/)(ra)  =  (|Proc[fi,p]D(m,  lnit(p(/),ra),£;) 

Note  that  a  function  call  is  only  triggered  and  the  function  name 
resolved  when  Kind  returns  Call(/),  which  depends  solely  on 
local  states.  It  does  not  matter  how  the  calling  request  gets  put 


into  the  local  state.  Whether  it  originates  from  the  code,  that  is, 
a  direct  call,  or  prepared  by  the  caller  in  the  memory  state  only 
to  be  moved  to  the  local  state  now,  which  indicates  an  indirect 
call,  the  procedural  semantics  handles  them  the  same.  This  means 
that  our  setting  transparently  handles  C-style  higher-order  function 
pointers,  without  having  to  provide  a  special  case  for  them. 

4.  Compositional  semantics 

The  procedural  semantics  given  so  far  can  only  describe  the  behav¬ 
iors  of  a  closed  program  p.  What  if  p  calls  a  function  outside  of  its 
domain?  By  definition,  the  execution  goes  wrong.  As  such,  the  pro¬ 
cedural  semantics  alone  is  not  compositional,  and  it  is  not  enough 
to  describe  the  behaviors  of  open  modules  (or  compilation  units). 

In  this  section,  we  are  going  to  make  our  procedural  semantics 
compositional  by  extending  it  with  a  rule  to  handle  external  func¬ 
tion  calls,  i.e.  calls  to  functions  that  are  not  defined  in  the  module. 

This  compositional  semantics  represents  external  function  calls 
as  events.  We  will  later  link  two  compilation  units  at  the  behavior 
level  by  replacing  each  external  function  call  event  with  the  behav¬ 
iors  of  the  callee  (see  Sec.  5). 

The  key  idea  of  our  compositional  semantics  is  not  to  get  stuck 
whenever  a  module  calls  an  external  function;  instead,  it  produces 
a  new  form  of  event  to  record  the  external  function  call.  This  is 
consistent  with  the  idea  that  events  represent  the  interaction  of  a 
compilation  unit  with  the  outside  environment:  for  an  open  module, 
external  functions  remain  part  of  the  outside  environment  until  their 
implementation  is  provided  by  linking.  These  external  call  events 
are  the  minimal  amount  of  syntax  necessary  to  model  external 
function  calls  at  the  level  of  behaviors. 

For  each  external  function  call  event,  we  record  (1)  the  function 
name;  (2)  the  memory  state  before  the  call,  because  the  external 
call  shall  depend  on  the  memory  state  under  which  it  will  be  called; 
and  (3)  the  memory  state  after  function  call.  The  external  function 
may  change  the  memory  state  arbitrarily,  which  the  caller  cannot 
control;  but  the  behavior  of  the  caller  depends  on  how  the  callee 
changed  the  memory  state. 

For  regular  input,  CompCert  produces  an  ordinary  event  con¬ 
taining  the  value  read  from  the  environment.  CompCert  cannot 
predict  the  value,  so  it  provides  a  behavior  for  every  possible  in¬ 
put.  Which  behavior  appears  at  runtime  will  depend  on  the  actual 
value  read.  More  precisely,  when  a  configuration  requests  an  input, 
it  must  resort  to  nondeterminism  and  provide  transition  steps  pro¬ 
ducing  events  for  all  possible  values.  Letting  the  event  carry  the 
value  makes  it  possible  to  have  different  follow-up  events  or  even 
termination  status  depending  on  the  actual  value  read.  For  instance, 
in  C,  the  command  scanf  (  "  %d"  ,  &i )  reads  a  value  j  from  the 
keyboard  and  stores  it  into  the  variable  i.  CompCert  models  it  as 
follows:  for  every  integer  j,  there  is  a  transition  producing  the  event 
IN(y)  asserting  that  j  is  the  value  read.  The  following  C  code: 

int  main  ( )  { 

int  i=0; 

scanf ( " %d" ,  & i ) ; 

printf("%d",  (i%2)); 
return  0; 

} 

will  produce  the  set  of  behaviors  ' : 

{IN(f)  ::  OUT(f  mod2)  ::  s  J.(0)  :  ;  6  [INT_MIN,  INT_MAX]| 

We  apply  the  same  technique  on  external  function  calls.  As  the 
caller  cannot  predict  how  the  callee  will  modify  the  memory  state, 
the  new  memory  state  upon  return  of  the  function  call  is  considered 


*  INT.MIN  and  INT.MAX  are  the  least  and  the  greatest  values  of  type  int. 
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as  an  external  input  from  the  environment.  This  is  why  an  external 
function  call  event  stores  both  the  new  and  old  memory  states. 

Then,  when  an  external  function  /  is  called  under  some  mem¬ 
ory  state  nil,  for  any  possible  memory  state  m2  representing  the 
memory  state  after  returning  from  the  external  function,  the  com¬ 
positional  semantics  will  allow  a  transition  (see  rule  EXTCALL 
in  Def.  10  below)  to  produce  the  external  function  call  event 
Ext  cal !(/,  mi,  7712).  Consequently,  the  caller  will  be  able  to  pro¬ 
vide  a  behavior  for  each  possible  memory  state  m2. 

This  leads  us  to  extending  the  events  and  transition  rules. 

Definition  8  (Extended  events).  Let  S.be  a  language  with  function 
calls.  We  write  Efor  the  set  of  extended  events,  defined  as  follows: 

e  6  E  Extended  event 

e  (e  6  E)  Regular  event 

I  Ext  cal  1(/,  mi,  m2)  (/ 6  F,  External 

mi, m2  6  MS)  function  call 

In  Extcall(/, mi, m2),  mi  and  m2  are  the  memory  states  before 
and  after  the  call,  respectively.  We  write  h  for  the  set  of  behaviors 
on  events  E,  which  we  call  extended  behaviors. 

If  E  c  F  is  a  set  of  function  names,  then  we  write  E^  for 
the  set  of  extended  events  where  all  external  function  call  events 
Extcall(/,  mi,  m2)  have  f  E  F,  and  B;?  the  set  of  behaviors  on 
such  events. 

Definition  9  (Module  or  compilation  unit).  Let  &  be  a  language 
with  function  calls.  A  module,  or  compilation  unit,  is  a  partial 
function  from  function  names  to  code^. 

Definition  10  (Compositional  semantics).  The  compositional 
small-step  semantics  Comp  [fi,  u]  of  a  compilation  unit  11  is  the 
small-step  semantics  defined  as  follows: 

•  The  set  of  events  is  E. 

•  The  set  of  configurations  is  MS  x  LS  x  K*  (as  in  the  procedural 
small- step  semantics ). 

•  The  transition  relation  (fi,  u)  \-cotnp  ■  ■  is  defined  as  follows: 

(£, u)  I- (m, f, Tf)— >(m', /',k')  eeE 
(fi,  u)  I- comp  (m,  I,  K)-^{m',  T,  k') 


(fi,  u)  h  (m,  I,  K)^{m' ,  r,  k') 
(fi,  u)  I- comp  (m,  I,  K)^{m',  I',  k') 


Kind(/)  =  Call(/)  /  ^  dom(u) 
e  =  Extcall(/,ra,m') 

/'  =  Restore(m',  Backup(m,  1)) 
(fi,  u)  tcomp  (m,  I,  K)^{m',  T,  k) 


(EXTCALL) 


•  A.S  in  the  procedural  semantics,  the  set  of  results  is  MS  and  the 
final  configurations  with  result  m  are  the  configurations  (m,  /,  s) 
where  Kind(/)  =  Return. 


The  compositional  big-step  semantics  of  u  is  the  function 
lulcomp  :  dom(u)  — >  MS  — »  ^(BF\dom(u))  obtained  from  big¬ 
stepping  the  compositional  small-step  semantics. 

Mcomp(/)(m)  =  (|Comp[£,u]D(m,  lnit(u(/),m),£) 


5.  Linking 

In  this  section,  we  are  going  to  define  a  linking  operator  x  between 
two  partial  functions  from  F  to  (MS  — >  This  linking  opera- 

^  Mathematically,  modules  and  programs  in  £  are  the  same.  But  conceptu¬ 
ally,  a  program  is  intended  to  be  stand-alone,  and  is  not  expected  to  call 
functions  that  are  not  defined  within  itself,  contrary  to  a  compilation  unit, 
which  we  view  as  an  open  module. 


(a) 


(b) 


i  i 

AAAAA  e  □□□□□  — >  AAAAA  e  □□□□□ 


i 

AAAAA  Extcall(/, mi,m2)  □□□□□ 

i 

AAAAA  Extcall(/, rai,m2)  □□□□□ 


i 

(c)  AAAAA  Extcall(/, mi,m2)  □□□□□ 

i 

— >  AAAAA  00000  □□□□□ 


Figure  1.  Three  cases  in  behavior  simulation:  (a)  regular  event;  (b) 
/  ^  dom(i/7);  (c)  o  o  o  o  o  6  fi(f). 

tor  will  be  defined  directly  at  the  level  of  the  behaviors,  independent 
of  the  underlying  languages  that  the  modules  are  written  in. 

Intuitively,  each  event  corresponding  to  an  external  function  call 
will  be  replaced  with  the  behavior  of  the  callee.  However,  plain 
straightforward  substitution  is  not  enough,  as  the  behaviors  of  a 
compilation  unit  Ui  can  involve  external  calls  to  functions  defined 
in  the  other  compilation  unit  U2  that  can  again  involve  external 
calls  to  functions  back  in  iii .  So,  we  have  to  resolve  those  formerly 
external  calls  that  are  now  internal,  namely  the  cross-calls  between 
the  two  compilation  units  Ui  and  U2. 

Let  F  c  F  be  a  set  of  function  names.  We  are  going  to  consider 
the  functions  in  T'(F)  =  F  — >  (MS  — >  'P(B))  that  describe 
the  behaviors  of  functions  of  F.  These  functions  may  call  some 
“external”  functions  which  might  still  be  in  F.  We  call  the  elements 
of  T(F)  open  observations,  which  we  usually  get  by  taking  disjoint 
unions  of  multiple  compilation  unit  semantics. 

Let  be  such  an  open  observation.  We  resolve  the  external 
calls  (in  if)  to  functions  of  F  by  recursively  supplying  if  to  do 
the  substitution,  yielding  an  observation  R.{if)  in  the  set  0(F)  = 
F  — >  (MS  — >  'P(Bf\f))  of  closed  observations,  where  there  are 
no  remaining  external  function  call  events  to  functions  in  F.  We 
shall  formally  define  R.  in  definition  12. 

Finally,  if  if\  and  if2  are  observations  with  disjoint  domains, 
then  we  define  the  linking  operator  as  ifitxif2  =  Riifi  W  if2)- 

5.1  Internal  call  resolution  by  behavior  simulation 

Let  if  e  T'(F)  be  an  open  observation.  To  resolve  its  internal 
function  calls,  we  are  going  to  define  a  semantics  that  will  actually 
simulate  the  behaviors  of  if. 

This  resolution  cancels  out  matching  external  call  events  by 
inlining  each’s  behavior.  We  define  this  resolution  by  simulating 
the  local  behaviors  of  each  module  through  a  small-step  semantics, 
treating  each  “external  call”  event  through  one  “computation”  step. 

The  simulation  process  is  shown  Fig.  1.  In  each  case,  J,  can  be 
seen  as  a  cursor  behind  which  lies  the  next  event  to  be  simulated. 
Each  step  (•  — >  •)  of  the  behavior  simulation  progresses  based  on 
the  next  event.  All  regular  events  are  echoed  as  in  (a),  as  well  as 
all  external  function  call  events  that  correspond  to  functions  not  in 
if  (b).  By  contrast,  each  external  function  call  event  corresponding 
to  a  function  defined  in  if  is  replaced  with  the  callee’s  events  (c) 
where  the  cursor  remains  in  the  same  spot  ready  to  simulate  the 
newly  inserted  events.  Each  step  only  performs  one  replacement  at 
a  time;  the  external  function  calls  of  the  inlined  behavior  are  not 
replaced  yet  until  the  cursor  actually  reaches  them. 

Then,  we  obtain  the  resulting  linked  semantics  by  big-stepping 
this  small-step  semantics  (see  examples  in  Fig.  2). 

Consider  a  function  fi,  and  a  behavior  cr»  Ext  cal !(/,  mi,  m2)  ■  b 
being  simulated.  Assume  that  the  prefix  event  sequence  cr  has 
already  been  simulated  so  the  Extcall  is  the  first  encounter  of  an 
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^  :  e  i 

/  :  Extcall(g)  £  J, 


(/Hg)(/):£i 

g  ■■  e/ 

f  :  Extcall(g)-  b 

(/Hg)(/):£/ 

g  :  e  ::  £  i 

/  :  Extcall(g)  ::::  Extcall(g)  :::: 

(/Wg)(/)  :  e  ::::  e  ::::  •  ■  ■ 

g  :  £  i 

/  :  Extcall(g)  ::::  Extcall(g)  :::: 

(/Hg)(/):£/ 

g  :  Extcall)/)-  b' 

/:  Extcall(g)-b 

(/Hg)(/):£/ 

That  is,  either  a  special  state  for  spurious  executions,  or  a  nor¬ 
mal  configuration  with  the  current  behavior  b  being  simulated, 
paired  with  x,  the  stack  of  the  remaining  expected  outcomes  ( if 
the  current  behavior  simulations  terminate )  and  the  remaining 
behaviors  to  simulate. 

•  The  transition  relation  f-  — >•  is  defined  as  follows: 

€  =  e  6  E 

b,x)Mb,x) 

f  t  dom(if/)  e  =  Extcall(/,mi,ra2) 

fiv  (e-  b,x)-^(b,x) 

_ b'  e  iA(/)(mi) _ 

I-  (Extcall(/,mi,m2)  •  b,^)->(b', ((m2, b)  ::^)) 

(s  i  (m),  ((m,  b)  ::  x))—^(^^x)  (return) 

fiv  (s/',x)^(s/',x) 


Figure  2.  Examples  of  behaviors  with  external  function  calls  be¬ 
tween  two  compilation  units,  one  defining  /,  another  defining  g,  ob¬ 
tained  by  big-stepping  the  behavior  simulation  semantics.  To  sim¬ 
plify,  we  assume  that  /  and  g  do  not  use  any  memory  state. 

external  function  call  event  where  f  e  F.  Then  mi  is  the  memory 
state  under  which  /  is  to  be  called,  and  m2  is  the  expected  memory 
state  upon  return  of  /.  Now,  a  behavior  b2  is  chosen  in  if/(f)(mi), 
and  is  to  be  simulated,  whereas  the  expected  return  memory  state 
m2  as  well  as  the  remaining  behavior  of  the  caller  b  to  be  simulated 
are  pushed  on  top  of  a  continuation  stack.  There  are  three  cases: 

•  The  simulation  of  this  chosen  behavior  b2  terminates  with  the 
expected  return  memory  state  m2.  In  this  case,  the  remaining 
behavior  b  of  the  caller  fi,  after  the  external  call,  popped  from 
the  continuation  stack  along  with  m2,  can  be  simulated. 

•  The  simulation  of  this  chosen  behavior  b2  goes  wrong,  diverges 
or  reacts.  In  such  cases,  the  simulation  result  of  b2  takes  over 
and  never  returns;  the  remaining  behavior  b  of  the  caller  after 
the  external  call  event  is  discarded. 

•  The  simulation  of  this  chosen  behavior  b2  terminates,  but  with 
a  return  memory  state  that  is  not  m2.  In  this  case,  the  remain¬ 
ing  behavior  of  the  caller  is  discarded,  too,  because  it  was  rele¬ 
vant  only  in  the  case  of  termination  with  m2.  Actually,  it  means 
that  the  simulation  of  the  caller  behavior  is  spurious.  This  is 
because  the  set  of  behaviors  of  the  caller  fo  has  a  behavior 
(T  •  Extcall(/,mi,m2)  •  b'  for  every  mfi  but  most  of  the 
guesses  are  wrong.  However,  even  though  the  simulation  of  the 
particular  behavior  does  not  make  sense,  the  rule  (EXTCALL) 
guarantees  to  have  all  possibilities  covered,  hence  there  will  al¬ 
ways  be  at  least  one  behavior  that  is  not  spurious,  it  is  OK  to 
tag  this  irrelevant  behavior  as  spurious.  Eormally,  the  simula¬ 
tion  will  not  go  wrong,  but  abruptly  terminate  with  a  special  re¬ 
sult  Spurious.  In  the  end,  when  big-stepping  the  small-step 
semantics,  those  spurious  behaviors  can  be  easily  removed. 

Definition  11  (Behavior  simulation).  We  define  the  behavior  sim¬ 
ulation  small-step  semantics  S[i/r]  as  follows: 

•  The  set  of  events  is  E. 

•  The  set  of  configurations  is  defined  as  follows: 

S  6  S 

::=  Spurious  Spurious  state 

I  (b,;^)  (b  6  B,  Regular 

X  6  (MS  X  B)*)  configuration 


m'  m 

fi  ^  (s  i(m'),  ((m,  b)  ^))^Spurious 

(RETURN-SPURIOUS) 

•  The  set  of  results  is  defined  as  follows: 

re  R 

::=  Spurious  Spurious  behavior 

I  m  (me  MS)  Regular  termination 

•  The  behavior  sequence  ((£  J,  (m)),  s)  is  the  only  final  state  with 
result  m  e  MS.  Spurious  is  the  only  final  state  with  result 
Spurious. 

Definition  12  (Resolution).  Let  BSpunous  =  j^-  |  Spurious  : 
(T  e  E*]  be  the  set  of  all  spurious  behaviors. 

Then,  the  resolution  of  an  open  observation  fi  e  'V(F)  is  the 
closed  observation  “Rifi)  e  ®(F)  defined  using  the  big-step  seman¬ 
tics  of  the  behavior  simulation  small-step  semantics,  excluding  spu¬ 
rious  behaviors: 

R(fi)(f)im)=  y  ^SW^(b,£)\B=P“--= 

b£l/r(/)(m) 

5.2  Semantic  linking 

Thanks  to  the  resolution  operator,  we  can  simply  define  the  linking 
of  two  observations: 

Definition  13  (Linking).  Let  fii ,  fi2  be  two  observations  with  dis¬ 
joint  domains.  Then,  their  linking  fii  x  fi2  is  defined  as: 

fi^xfi2=  Ritfii  i±)  fi2) 

With  the  definition  of  linking  at  the  level  of  behaviors,  we  can 
show  that  the  compositional  semantics  of  a  compilation  unit  is 
indeed  compositional.  In  other  words,  in  the  special  case  where  the 
two  modules  are  in  the  same  language,  linking  their  compositional 
semantics  at  the  level  of  their  behaviors  exactly  corresponds  to  the 
compositional  semantics  of  the  syntactic  concatenation  of  the  two 
compilation  units,  which  conforms  to  the  intuition  of  linking: 

Theorem  1.  If  Uo,Uj  are  two  compilation  units  with  disjoint  do¬ 
mains  in  the  same  language  with  function  calls,  then: 

Juo  W  ttlj comp  =  [[tlolcomp  ^  [[till comp 

Proof  (in  Coq).  •  2:  We  introduce  a  simulation  diagram:  an  ex¬ 
ecution  step  in  Juolcomp  Juilcomp  matches  at  least  one  ex¬ 
ecution  step  in  Juo  i+l  Uilcomp-  In  this  simulation  diagram,  we 
maintain  an  invariant  between  the  configuration  state  (b,x) 
in  luolcomp  Iiiilcomp  and  the  configuration  state  (m,l,K)  in 
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Uo 


Uo 

(b) 

Ul 

Uo 

U. 

No  Extcall  from  here  on 

J 

1  1  1  1 
-  —  ' 

Figure  3.  Illustrations  of  the  three  cases  in  the  c  branch  of  the 
proof  of  theorem  1:  (a)  one  of  the  call  to  Uo  never  returns;  (b)  all 
external  calls  return  with  finitely  many  of  them;  (c)  all  external 
calls  return  with  infinitely  many  of  them.  Dotted  arrows  denote  co¬ 
induction  hypothesis. 


Juo  i±l  ujcomp  such  that  k  can  be  decomposed  in  k'  #  k”  with  b 
being  a  valid  behavior  in  |[u,]]comp  from  (m,  I,  k')  and  k”  match¬ 
ing  the  stack 

•  C:  the  result  for  terminating  and  stuck  behaviors  is  proven  by 
induction  on  the  length  of  the  execution.  On  the  other  hand, 
diverging  and  reacting  behaviors  are  dealt  with  in  the  following 
way.  Starting  from  such  a  behavior  b  in  Juo  W  ujcomp,  we 
first  isolate  an  infinite  step  sequence  corresponding  to  b  by 
definition.  Given  i  e  {0, 1),  we  build  a  behavior  b'  in  |[u,]cpmp  by 
replacing  the  calls  to  functions  in  Ui_,  with  external  calls,  and 
we  prove  that  simulating  b'  in  S3[|[uol|comp  W  luilcpmp]  yields  b, 
i.e.  b  6  (|S[|[uoIcomp  O  IIuiIcomp]D(lb',e).  There  are  three  cases 
(each  of  which  is  illustrated  in  Fig.  3): 

(a)  There  is  a  call  to  a  function  in  Ui_,-  that  never  terminates. 
So,  before  the  first  such  external  call,  we  can  build  the  finite 
prefix  of  a  behavior  in  |[u,|comp>  and  deal  with  this  external 
function  call  by  coinduction  replacing  i  with  1  -  i. 

(b)  All  calls  to  functions  in  Ui_,  terminate  and  there  are  finitely 
many  of  them.  So,  until  the  last  such  external  function  call, 
we  can  huild  the  finite  prefix  of  a  behavior  in  JuJcomp-  Then, 
we  prove  that  the  remaining  behavior  of  Jup  W  ujcomp  that 
calls  no  functions  of  Ui_,-  is  actually  a  behavior  in  JuJcomp- 

(c)  All  calls  to  functions  in  Ui_i  terminate  but  there  are  infinitely 
many  of  them.  So,  we  have  to  build  a  reacting  behavior  in 
JuJcomp  with  infinitely  many  external  function  calls  to  Ui_,, 
each  one  replacing  each  call  to  a  function  in  Ui_i. 

□ 


We  could  have  used  behavior  trees  to  model  external  function 
calls.  Behavior  trees  are  well-known  to  be  used  in  denotational  se¬ 
mantics  to  model  input.  They  would  have  turned  events  for  exter¬ 
nal  function  calls  Extcall(/,  mi,  m2)  into  branching  nodes,  with 
each  branch  labeled  with  the  memory  state  m2 .  U sing  behavior  trees 
instead  of  plain  behaviors  would  have  helped  remove  spurious  be¬ 
haviors,  as  the  two  rules  (RETURN)  and  (RETURN-SPURIOUS) 
would  have  been  replaced  by  a  single  rule  actually  choosing  the 
right  branch  in  the  behavior  tree.  However,  this  would  require 
adopting  behavior  trees  as  the  semantic  object  for  the  composi¬ 
tional  semantics.  Then,  the  process  of  making  the  procedural  se¬ 
mantics  into  a  compositional  semantics  would  bring  deep  changes 
to  the  procedural  small-step  semantics,  and  the  current  compiler 
correctness  proofs  of  CompCert  based  on  simulation  diagrams  over 
those  small-step  semantics  would  require  deep  changes  as  well. 

We  believe  that  our  current  per-behavior  setting,  where  behav¬ 
iors  are  represented  as  first-order  objects,  shall  require  less  intru¬ 
sive  changes  in  the  current  CompCert  proofs.  From  the  compiler’s 
point  of  view,  the  external  function  call  events  introduced  by  our 
semantics  need  not  be  treated  differently  from  ordinary  events. 

The  relationship  between  the  compositional  semantics  and  the 
procedural  semantics  of  a  module  viewed  as  a  whole  program 


is  rather  obvious:  it  suffices  to  link  the  compilation  unit  with  an 
observation  that  makes  every  external  function  call  stuck. 

Lemma  2  (Compositional  and  procedural  semantics).  Let  u  be  a 
compilation  unit  in  some  language  with  function  calls.  Define 
the  constant  stuck  observation: 

V/  i  dom(u),  Vm  :  {f){m)  =  {ei  ) 

Then,  V/  6  dom(u)  :  |[u]l(/)  =  (|[ul  comp  x  fit  )(/)• 

6.  Refinement  and  compiler  correctness 

The  term  “refinement”  in  program  development  dates  back  to  the 
early  70s  proposed  by  Dijkstra  [5]  and  Wirth  [21].  It  quickly  grows 
in  various  fields  [15].  Refinement  also  plays  a  heavy  role  in  com¬ 
piler  verification  as  shown  in  CompCert  [12]  and  Miiller-Olm  [16]. 

In  this  work,  we  use  refinement  to  define  and  prove  correctness 
of  separate  compilation.  We  first  state  the  necessary  conditions 
for  a  relation  to  be  a  refinement  relation.  Then  we  show  how  our 
refinement  framework  applies  to  compiler  correctness.  Finally,  we 
show  that  the  behavior  refinement  relation  defined  in  CompCert 
extends  well  to  the  setting  of  our  compositional  semantics. 

6.1  Refinement  relations 

Instead  of  defining  on  pairs  of  programs  or  specifications,  we  define 
our  refinement  relations  on  sets  of  extended  behaviors.  One  reason 
for  this  choice  is  to  support  refinement  between  multiple  languages 
and  program  logics.  Another  reason  is  to  better  handle  interactions 
between  refinement  relations  and  the  linking  operator  —  or,  more 
generally,  between  refinement  relations  and  the  resolution  operator, 
which  we  will  discuss  at  the  end  of  this  subsection. 

To  generalize  refinement  to  the  compositional  semantics  instead 
of  sticking  to  the  procedural  semantics  of  a  whole  program,  we 
define  refinement  relations  on  sets  of  extended  behaviors  instead  of 
plain  behaviors. 

Let  E  be  a  binary  relation  on  'P(B).  Then,  we  lift  it  to  observa¬ 
tions  in  a  straightforward  way:  we  define  E  i/r'2  if,  and  only  if, 
domfi/'i)  =  dom(i/^2)  and: 

V/  6  domfi/'i),  Vm  :  E 

Definition  14  (Refinement  relations).  A  binary  relation  E  on  f’(B) 
is  a  refinement  on  observable  behaviors  if  all  these  hold: 

•  reflexivity:  VBo  6  'P(B),  Bo  E  Bo 

•  transitivity:  VB|,  B2,  B3  6  f’(B)  : 

Bi  E  B2  A  B2  E  B3  =>  Bi  E  B3 

•  congruence:  for  any  observations  such  that  ifi  is  never 

empty  {'if  6  dom(i/ri),  Vm,  i/'i(/)(m)  0): 

lAi  E  i/'2  =>  E 

On  top  of  a  preorder,  we  add  the  congruence  property,  thanks  to 
which  we  can  easily  show  that  refinement  is  compositional: 

Theorem  2  (Compositionality  of  refinement).  Let  i/'i,i/'2  Avo  ob¬ 
servations  such  that  ijj\  E  i/'2  and  ifi  is  never  empty.  Then,  for  any 
never-empty  observation  f  with  a  domain  disjoint  from  fii,  we  have 
1/'  X  l/'l  E  l/'  X  tf/2. 

We  will  see  in  Sec.  6.4  that  the  CompCert  improvement  relation 
is  actually  a  refinement  relation  meeting  all  those  requirements. 

6.2  Compositional  program  verification 

Refinement  is  expressed  at  the  level  of  extended  behaviors.  So,  we 
can  consider  that  a  specification  is  an  open  observation,  so  that 
a  compilation  unit  Ui  in  some  language  with  function  calls  can 
be  said  to  make  a  specification  tfi  hold  if  JuiJcomp  E  i/'i-  (The 
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specification  can  still  contain  external  function  call  events  to 
functions  outside  of  dom(Ui).) 

Thanks  to  refinement  compositionality,  and  to  the  fact  that  the 
linking  operator  is  defined  at  the  semantic  level  of  extended  behav¬ 
iors,  this  program  verification  scheme  is  compositional  and  suitable 
for  open  modules.  Indeed,  a  compilation  unit  112  with  a  domain  dis¬ 
joint  from  i/'i  can  be  proven  to  make  some  specification  ij/  hold  un¬ 
der  assumptions  that  it  is  linked  with  an  unknown  library  verifying 
i/t]  independently  of  the  actual  implementation  of  the  library:  we 
can  directly  link  the  compilation  unit  1I2  with  the  specification  i/'i 
and  prove  that  i/'i  x  |[u2]]comp  Ei/'-  Then,  if  JuJcomp  Ei/'i,  refinement 
compositionality  gives  us  the  refinement  proof  on  the  linked  pro¬ 
gram:  Juilcomp  |[u2lcomp  E  i/';  finally,  in  the  special  case  where  Ui 
and  U2  are  written  in  the  same  language,  we  have  Jiii  i+l  U2|comp  E 1/'. 

6.3  Verified  separate  compilation 

In  this  work,  we  follow  the  common  notion  of  compiler  correct¬ 
ness  that  a  compiler  is  correct  if  all  possible  behaviors  of  the  target 
program  are  valid  behaviors  of  the  source  program.  Since  language 
specifications  often  leave  some  decisions  to  compilers  for  flexibil¬ 
ity,  a  compiler  is  allowed  to  remove  behaviors.  In  other  words,  com¬ 
pilation  is  a  refinement  step. 

Our  compiler  correctness  definition  is  fairly  standard  except  for 
the  abstract  refinement  relation  instead  of  a  plain  subset  relation. 
As  it  uses  a  refinement  relation  on  extended  behaviors,  compiler 
correctness  generalizes  to  compiling  open  modules  by  considering 
their  compositional  semantics. 

Definition  15  (Compiler  (optimizer)  correctness).  Let  fi,  fi'  be  two 
languages  with  function  calls. 

Under  a  refinement  relation  E,  a  compiler  C  from  fi  to  fi' 
is  said  to  be  correct  if  and  only  if  for  any  compilation  unit  11, 
lC(u)l 

comp  —  [[1*1  comp‘ 

Theorem  3.  Under  a  refinement  relation,  multiple  correct  com¬ 
pilers  are  compatible  with  separate  compilation.  If  Ui, ...  ,u„  are 
compilation  units  with  disjoint  domains  and  Ci, ...  ,C„  are  all  cor¬ 
rect  compilers,  then:  |[Ci(Ui)  i±l  •  •  •  i+l  C„(u„)]]  E  Ju,  i+l  •  •  •  i+l  u„]] 

Proof.  By  definition  of  compiler  correctness,  Vt,  |[C,(iii)lcomp  E 
|[u,-]]comp-  By  transitivity  and  multiple  applications  of  refinement 
compositionality  (Theorem  2),  we  obtain 

[[Ci(Ui)]]comp  X  ...  X  [[C,j(u„)]]comp  E  [[ifijcomp  ^  .  .  .  ^  [[u«lcomp 

which  leads  to  |[Ci(U|)  l±l  •  •  •  i±l  C„(u„)l|compE|[ui  l±l  •  •  •  l±l  ujcomp  be¬ 
cause  of  Theorem  1  (linking  in  the  same  language).  Finally,  to 
go  from  the  compositional  to  the  procedural  semantics,  refinement 
compositionality  with  and  Lemma  2  give  the  result.  □ 

The  theorem  tells  us  that  we  can  link  several  object  files  which 
are  compiled  independently  with  potentially  different  compilers. 
As  long  as  all  the  compilers  are  correct,  the  linked  executable  will 
behave  as  an  instance  of  the  program  linked  at  the  source  level. 

With  a  single  correct  compiler  C,  Theorem  3  ensures  the  cor¬ 
rectness  of  separate  compilation  even  though  we  may  not  have 
C(ui)  l+l  •  •  •  l+l  C(u„)  =  C(ui  l+l  •  •  •  l+l  u„)  (e.g.  if  C  performs  some 
function  inlining). 

6.4  Example:  the  CompCert  refinement  relation 

When  developing  a  compiler,  it  is  usually  hard  or  even  impossible 
to  retain  one  kind  of  behavior  -  the  stuck  behaviors.  Imagine  a  C 
program  that  takes  the  address  of  a  local  variable,  adds  a  constant  to 
it,  and  then  uses  the  result  as  the  address  to  write  to.  If  the  arithmetic 
operation  brings  the  address  out  of  bound,  the  C  semantics  will  get 
stuck.  While  in  the  target  assembly  code,  the  program  is  likely  to 
continue  running  and  crash  at  a  much  later  point,  or  even  keep 


going  normally  as  the  place  the  program  writes  to  might  be  an 
unused  stack  space. 

In  CompCert  [II],  all  behaviors  with  the  event  sequence  before 
crashing  as  a  prefix  are  considered  “improvements”  of  the  crash¬ 
ing  behavior.  The  refinement  relation  it  uses,  initially  proposed  by 
Dockins  [6]  and  integrated  into  CompCert,  incorporates  improve¬ 
ments  and  is  an  extension  of  a  subset  relation.  In  this  section,  we 
extend  it  to  extended  behaviors  with  external  function  call  events. 

Definition  16  (Behavior  improvement).  Let  hi,h2be  two  extended 
behaviors.  bi  improves  b2  (bi  E  b2)  if  and  only  if: 

•  either  bi  =  b2,  or 

•  b2  is  a  “stuck  prefix”  ofh\:  there  exists  an  event  sequence  cr 

and  a  behavior  b  such  that  b2  =  cri  and  bi  =  cr  •  b. 

Definition  17  (CompCert  improvement  relation).  Let  hi,  B2  be  two 
sets  of  extended  behaviors.  Bi  improves  B2  (Bi  E  B2)  if,  and  only  if 
Vbi  6  Bi,3b2  £  B2  :  bi  E  b2. 

Theorem  4.  The  CompCert  improvement  relation  is  a  refinement 
relation. 

Proof  (in  Coq).  Congruence  is  proven  by  a  lock-step  backwards 
simulation,  where  the  invariant  between  two  configurations  of  the 
semantics  uses  behavior  improvement  for  the  behavior  being  simu¬ 
lated  as  well  as  every  frame  of  the  continuation  stack.  □ 

This  theorem  shows  that  the  CompCert  improvement  relation 
defined  on  behaviors  extends  well  to  extended  behaviors  and  ver¬ 
ified  separate  compilation.  Consequently,  a  correct  compiler  can 
compile  an  open  module  as  if  it  were  a  whole  program,  by  con¬ 
sidering  an  external  call  event  in  no  ditferent  way  than  a  regular 
event.  By  the  way,  it  also  shows  that  a  correct  compiler  necessarily 
preserves  external  function  calls:  in  no  way  can  it  optimize  them 
away  before  linking  with  an  actual  implementation  for  them.  This 
is  understandable  because  a  compiler  processing  an  open  module 
has  no  hypotheses  about  external  functions. 

6.5  Coq  implementation 

Our  Coq  implementation  provides  the  following  enhancements, 
which  we  did  not  mention  for  the  sake  of  presentation. 

Functions  can  be  passed  arguments,  and  they  can  return  a  value. 
Then,  the  arguments  are  additional  parameters  to  the  semantics  of  a 
module,  and  they  appear  in  the  external  function  call  events  as  well 
as  the  return  values.  Similar  to  the  resulting  memory  state  upon  re¬ 
turn  of  an  external  function  call,  the  caller  has  to  provide  a  behavior 
for  each  possible  return  value  as  well:  given  an  external  function 
/  called  with  arguments  arg  and  the  memory  state  mi,  the  exter¬ 
nal  function  call  rule  (EXTCALL  in  Def.  10)  of  the  compositional 
semantics  produces  an  event  Extcall(/,  arg,  mi,  ret,  m2)  for  any 
result  ret  and  any  memory  state  m2. 

Throughout  the  execution  of  a  language  with  function  calls,  we 
added  the  ability  of  maintaining  some  invariant  on  the  memory 
state.  We  equip  the  set  of  memory  states  with  some  preorder  <,  such 
that,  whenever  an  internal  step  is  performed  from  a  memory  state  m, 
the  new  memory  state  m'  is  such  that  m  <m' .  Consequently,  the  se¬ 
mantics  of  a  compilation  unit  provides  no  behavior  for  those  exter¬ 
nal  function  calls  that  do  not  respect  <:  in  the  compositional  seman¬ 
tics,  the  rule  for  external  function  calls  (EXTCALL  in  Def.  10)  pro¬ 
ducing  an  external  call  event  Ext  cal !(/,  arg,  mi,  ret,  m2)  requires 
the  additional  premise  m,  <  m2.  This  enhancement  is  important 
for  CompCert,  where  the  memory  model  requires  that  the  memory 
evolve  monotonically  to  prevent  a  deallocated  memory  block  from 
being  reused.  The  proofs  of  compilation  passes  in  CompCert  make 
critical  use  of  this  assumption. 
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In  a  language  with  function  calls,  the  functions  Backup  and 
Restore  which  respectively  save  the  local  state  into  a  continua¬ 
tion  stack  frame  and  retrieve  a  new  one  from  such  a  frame,  can 
change  the  memory  state:  instead  of  only  returning  a  frame  or  lo¬ 
cal  state,  they  return  a  new  memory  state  as  well.  We  make  those 
functions  compatible  with  the  preorder  <  over  memory  states.  This 
enhancement  allows  us  to  model  the  allocation  and  deallocation  of 
a  concrete  stack  frame  in  the  memory  upon  function  call  and  return. 

We  also  provide  a  Coq  implementation  to  instantiate  our  frame¬ 
work  with  the  CompCert  common  subexpression  elimination  pass 
to  turn  it  from  whole-program  compilation  to  separate  compilation. 

This  pass  is  carried  over  CompCert  RTL  (“register  transfer 
language”)  as  both  source  and  target  languages.  It  is  a  3-address 
language  with  infinitely  many  per-function-call  pseudo-registers. 
The  body  of  a  function  is  a  control-flow  graph. 

The  common  subexpression  elimination  actually  replaces  nodes 
of  the  control-flow  graph  with  no-ops,  if  those  nodes  are  taking 
part  to  expressions  that  were  already  computed  before.  This  pass 
actually  does  not  alter  function  calls  and  does  not  modify  the 
memory  between  source  and  target  programs. 

We  took  a  subset  of  RTL  eliminating  floating-point  operations 
(due  to  typing  constraints).  Then,  we  added  our  external  function 
call  event  to  the  CompCert  so-called  “external  functions”  (namely 
primitives  such  as  volatile  load  and  store,  memory  copy,  or  I/O, 
some  of  which  generate  events)  to  enable  their  support  by  RTL. 
Then,  we  rewrote  RTL  into  the  setting  of  our  framework  and  proved 
that  the  corresponding  compositional  semantics  and  the  CompCert 
RTL  language  with  those  new  events  produce  the  same  big-step 
semantics.  Thus,  there  were  no  changes  to  the  proof  of  the  compi¬ 
lation  pass  (except  the  removal  of  floating-point  operations)  and  the 
correctness  of  separate  compilation  were  stated  directly  in  terms  of 
the  original  RTL  semantics  and  proved  using  our  framework. 

7.  Languages  with  different  memory  state  models 

Our  new  approach  is  close  to  the  way  how  CompCert  [12]  handles 
I/O  events.  Actually,  we  generalize  it  to  arbitrary  external  function 
calls,  and  we  give  the  formal  argument  why  this  approach  is  cor¬ 
rect  by  enabling  those  external  functions  to  be  implemented  and 
their  behaviors  inlined.  This  means  that  the  compiler  correctness 
techniques  used  for  CompCert  and  restricted  to  whole  programs 
can  be  easily  applied  to  open  modules. 

The  main  ditference  introduced  by  considering  the  behaviors 
of  open  modules  is  that  now  part  of  the  memory  state  becomes 
observable.  There  still  remains  a  problem:  a  compilation  pass  can 
alter  the  observable  memory  state. 

But  alterations  can  deeply  involve  the  structure  of  the  memory 
state  so  that  the  relation  between  the  memory  states  of  the  source 
programs  and  the  compiled  ones  can  itself  change  during  execution. 
Such  relations  are  called  Kripke  worlds  [2,  10]  in  the  setting  of 
Kripke  logical  relations.  But  it  becomes  necessary  to  define  the 
reflnement  relation  as  a  “binary”  simulation  diagram  deprecating 
the  notion  of  the  “unary”  semantics. 

In  this  section,  we  show  that  such  Kripke  logical  relations  are 
not  necessary  to  deal  with  critical  memory-changing  passes  of 
CompCert.  To  this  purpose,  we  introduce  a  lightweight  infrastruc¬ 
ture  to  deal  with  memory-changing  relations,  a-refinement,  that 
can  directly  cope  with  our  unary  semantics  for  open  modules  with 
traces  of  external  function  call  events. 

7.1  a-refinement 

In  practice,  a  separate  compiler  does  make  some  assumptions  on 
the  behaviors  of  external  functions.  If  these  assumptions  are  also 
preserved  as  an  invariant  by  the  execution  of  functions  defined  in 
II,  the  compilation  of  u  can  take  advantage  of  this  invariant. 


Consider  a  module  ii  written  in  a  procedural  language  fi.  Let 
MS  be  the  set  of  memory  states  of  fi.  Let  I  c  MS  be  an  invariant 
in  fi,  i.e.  such  that  for  any  local  transition  (m,  /'),  if  m  6  / 

then  m'  e  I.  Then  we  can  restrict  the  set  of  memory  states  of  fi  to 
/,  yielding  a  procedural  language  fi|/  such  that  the  corresponding 
compositional  semantics  mandates  all  external  function  calls  to 
return  with  memory  states  also  satisfying  the  invariant.  In  other 
words,  for  any  external  function  call  event  Extcall(/,mi,ra2) 
produced  by  the  compositional  semantics  of  fi|;,  we  always  have 
mi, m2  6  I. 

Now  consider  a  target  procedural  language  fi'  having  an  invari¬ 
ant  Let  C  be  a  compiler  from  fi  to  fi'.  Then,  we  say  that  C(u) 
a-refines  u  (C(u)  Ea  u)  if,  and  only  if  there  exists  a  bijection  a  be¬ 
tween  I  and  /'  such  that  |C(u)]lcompl;'  E  a(Mcompl/)- 

In  practice,  it  means  that  the  separate  compiler  C  is  correct 
when  the  modules  are  linked  with  other  modules  also  satisfying 
the  same  invariants  (/  in  the  source,  /'  in  the  target).  Indeed,  in 
the  case  when  such  a  bijection  a  exists,  then  we  can  define  the 
procedural  language  a(fi|;)  isomorphic  to  fi|;  where  the  set  of 
memory  states  is  a(I)  =  /',  and  then  we  can  use  the  usual  non- 
memory-changing  refinement  relation  between  a(fi|/)  and  fi'|/'. 
Then,  separate  compilation  is  correct  provided  that,  when  building 
the  whole  program  by  linking  with  a  module  containing  a  main 
entry  point,  the  initial  memory  state  passed  to  main  also  satisfies 
the  invariant  (/  in  the  source,  /'  in  the  target). 

Then,  Theorem  3  can  be  rephrased  as  follows:  if  iq, . . . ,  u„  are 
compilation  units  in  languages  fii,...,fi„  with  disjoint  domains 
and  Cl , . . . ,  C„  are  all  compilers  to  the  same  target  language  fi'  such 
that,  for  each  i,  Ci  is  correct  with  respect  to  an  a, -refinement,  then: 

|[Ci(ui)  l±l  •  •  •  l±l  C„(u„)]]  E  aidui])  X  ...  X  a„(|[u„]]) 

In  the  rest  of  this  section,  we  show  how  to  systematically  turn 
CompCert-style  memory  injection  into  a-bijection  by  using  a  crit¬ 
ical  memory-changing  pass  of  CompCert  as  an  example.  The  same 
technique  can  also  be  used  to  support  translation  of  calling  con¬ 
ventions  (e.g.,  mapping  local  variables  or  temporaries  in  the  source 
into  stack  entries  in  the  target). 

7.2  Case  study:  memory  injection  for  local  variable  layout 

One  of  the  most  critical  memory-changing  compilation  phases  in 
CompCert  is  the  phase  that  lays  out  local  variables  into  a  stack 
frame.  Indeed,  CompCert  does  not  represent  memory  as  a  unique 
byte  array,  but  as  a  collection  of  byte  arrays  called  memory  blocks. 
The  purpose  of  this  memory  model  is  to  allow  pointer  arithmetic 
only  within  the  same  block.  In  this  setting,  CompCert  defines  the 
semantics  of  a  subset  of  C  by  allocating  one  block  for  each  local 
variable,  so  that  the  following  code  example  indeed  gets  stuck 
(has  no  valid  semantics,  which  corresponds  to  undefined  behavior 
according  to  the  C  standard): 

void  f  (void) 

{  int  a[2]  =  (18,  42},  b[2]  =  (1729,  6 }  ; 
register  int  *pa  =  &a[2],  *pb  =  &b[0]; 

*pa  =  3;  /*  undefined  behavior, 

NOT  equivalent  to  *pb  =  3  */  } 

In  this  example,  upon  function  entry,  CompCert  allocates  two 
different  memory  blocks,  one  (say  with  identifier  2)  of  size  8  for  a 
and  one  (say  with  identifier  3)  of  size  4  for  b.  Then,  the  pointer  pa 
contains  an  address  which  is,  in  CompCert,  not  a  plain  integer,  but 
a  pair  Vptr(fc,  o)  of  the  block  identifier  b  and  the  byte  offset  o  within 
this  block.  So,  the  value  of  pa  is  actually  Vptr(2,  8)  whereas  pb  is 
Vptr(3, 0).  So,  the  two  pointers  are  not  equal,  and  in  fact,  pa  is  not 
a  valid  pointer  to  store  to,  because  the  size  of  the  block  identifier 
corresponding  to  a  is  8.  In  other  words,  the  boundary  of  one  block 
is  in  no  way  related  to  other  blocks.  This  instrumented  semantics 
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Figure  4.  Injecting  two  arrays  into  the  stack 


can  help  in  tracking  out-of-bounds  array  accesses  in  a  C  program 
(for  instance  using  the  reference  interpreter  included  in  CompCert 
to  “animate”  the  formal  semantics  of  CompCert  C). 

But  in  practice,  this  C  code  is  actually  compiled  by  CompCert 
to  an  intermediate  language  called  Cminor,  which  performs  pointer 
arithmetic  and  memory  operations  for  stack-allocated  variables  of 
a  given  function  call  in  one  single  memory  block,  called  the  stack 
frame.  The  compiled  Cminor  code  looks  like  the  following  C  code: 


void  f  (void) 

{  char*  stk [16] ; 

*  (int*)  (&stk [0] )  = 
* (int*)  (&stk  [8] )  = 
[  register  int*  pa 
register  int*  pb 
*pa  =  3 ; 


18;  * (int*)  (&stk  [4] )  =  42; 

1729;  *  (int*)  (&stk [12] )  =  6; 
=  (int*)  (&stk  [8] )  ; 

=  (int*)  (&stk [8] ) ; 


The  proof  of  the  Cminor  code  generation,  from  the  Csharpminor 
intermediate  language  still  having  one  memory  block  for  each 
local  variable,  is  based  on  a  memory  transformation  called  memory 
injection  [13].  An  injection  is  a  partial  function  i  :  BlockID  — > 
{BlockID  X  Z)  mapping  a  source  memory  block  to  an  olTset  within 
a  target  memory  block.  In  our  example,  the  target  memory  block 
allocates  a  stack  frame  (say  with  block-id  2)  of  size  16  bytes;  the 
source  memory  block  for  a  is  mapped  to  offset  0  within  this  stack 
block,  and  b  is  mapped  to  offset  8:  i(2)  =  (2, 0)  and  i(3)  =  (2, 8). 


7.3  Issues 

Although  the  CompCert  memory  injection  is  the  most  critical  mem¬ 
ory  transformation  used  in  CompCert  and  makes  formal  proofs  of 
whole-program  compilation  fairly  understandable  (but  by  no  means 
straightforward),  it  has  several  issues  that  make  it  difficult  to  turn 
those  proofs  into  separate  compilation  (in  the  sense  that  it  is  diffi¬ 
cult  to  turn  the  memory  injection  into  a  bijective  memory  transfor¬ 
mation  amenable  to  a-refinement). 

Granularity  of  preservation  by  memory  operations  In  the  current 
correctness  proof  of  the  Csharpminor-to-Cminor  pass,  the  memory 
injection  is  kept  as  an  invariant,  but  the  preservation  properties 
make  the  memory  injection  hold  even  during  the  allocation  of 
memory  blocks  corresponding  to  the  source  local  variables.  More 
precisely,  assume  that  main  is  called  from  source  memory  mo 
related  to  target  memory  m^  by  a  memory  injection  ig.  Then: 

1 .  First,  the  stack  frame  block  b'  is  created  in  the  target  memory 
which  becomes  mj .  Memory  injection  to  still  holds  between  mo 
and  mj . 

2.  Then,  the  memory  block  for  the  local  variable  a,  say  b2,  is 

created  in  the  source  memory  which  becomes  m2.  Memory 
injection  between  m2  and  mj  becomes  t2  =  to  W  ((>2  (b',  0)). 

3.  Then,  the  memory  block  for  the  local  variable  b,  say  bg  is 
created  in  the  source  memory  which  becomes  mg,  injected  into 
mj  through  ig  -  Lg'ii  (bg  (b',  8)) 

The  current  memory  injection  invariant  is  too  fine-grained  because 
it  also  holds  in  the  middle  of  allocating  the  memory  blocks  for  the 
source  local  variables.  It  actually  means  that  the  target  memory  mj 


is  related  to  any  source  memory  that  can  be  obtained  in  the  middle 
of  the  allocation  of  such  source  blocks,  which  prevents  the  injectiv¬ 
ity  of  the  memory  transformation.  Conversely,  the  allocation  of  the 
target  stack  frame  block  is  performed  without  changing  the  source 
memory,  so  that  the  memory  injection  is  not  even  functional. 

To  remedy  this  problem,  we  make  the  preservation  lemma  for 
memory  injection  more  coarse-grained:  instead  of  specifying  a  per- 
allocation  preservation  property,  we  specify  an  all-in-one  preserva¬ 
tion  property  to  reestablish  injection  only  after  all  the  blocks  corre¬ 
sponding  to  source  local  variables  are  allocated. 

Dynamic  memory  changes  The  proofs  of  compilation  passes  in¬ 
volving  memory  injections  build  the  block  mapping  on  the  fly  dur¬ 
ing  the  execution  of  the  program:  whenever  a  block  is  allocated,  the 
mapping  is  modified  accordingly.  But  the  mapping  is  not  yet  known 
for  those  source  memory  blocks  that  are  not  allocated  yet,  e.g.  in 
future  function  calls,  or  heap  allocations  (malloc  and  free  library 
functions).  It  means  that  the  mapping  dynamically  changes  during 
the  execution  of  a  program.  This  is  why  Kripke  logical  relations  are 
used  to  handle  memory-changing  compilation  passes. 

To  solve  those  issues,  we  propose  to  define  a  stronger  notion 
of  memory  injection  in  two  steps.  First,  the  block  mapping  is 
computed  from  the  source  memory  using  additional  information 
contained  in  block  tags.  Then,  the  target  memory  is  computed  from 
both  the  source  memory  and  the  computed  block  mapping. 

7.4  Our  approach 

In  fact,  the  memory  transformation  for  the  Csharpminor-to-Cminor 
is  actually  systematic  and  can  be  defined  directly  depending  on  the 
shape  of  the  memory  itself  rather  than  specified  by  an  invariant 
preserved  by  memory  operations  such  as  allocating  a  new  block. 
To  this  purpose,  we  need  to  add  more  information  into  the  mem¬ 
ory  under  the  form  of  lags  attached  to  each  memory  block.  Such 
information  is  provided  by  the  language  semantics  when  allocating 
a  new  memory  block,  and  no  longer  changes  during  the  execution 
of  the  program.  It  plays  little  active  role  in  the  execution  of  the 
program,  as  it  is  only  used  during  the  compilation  proof. 

Block  identifiers  To  make  proofs  simpler,  we  modify  the  seman¬ 
tics  of  Csharpminor  and  Cminor  to  keep  the  block  identifiers  syn¬ 
chronized  so  that  as  many  blocks  are  “allocated”  in  the  source  as  in 
the  target.  In  the  source,  an  empty  block  (within  which  no  operation 
or  pointer  arithmetic  is  valid)  is  first  allocated,  then  the  blocks  for 
local  variables  are  allocated;  whereas  in  the  target,  the  stack  frame 
block  is  first  allocated  with  its  size,  then  many  empty  blocks  are 
allocated,  one  for  each  variable. 

This  has  no  incidence  on  performance:  such  empty  blocks  can 
be  considered  as  logical  information,  which  correspond  to  no  mem¬ 
ory  in  practice.  They  are  not  even  reachable  in  the  program. 

Tags  A  block  has  a  tag  of  one  of  the  following  forms: 

t  eT  :=  Heap 

global  variable  or  free  store 
I  Stack(Main(/,iz)) 

Stack  frame  for  function  /  of  size  sz  bytes 
I  Stack(Var(/,  id,  b,  sz,  of)) 

Local  variable  id  in  /  of  size  sz  injected  into  b  at  offset  of 

Information  defined  in  the  tags  is  provided  either  by  the  seman¬ 
tics  of  Csharpminor  (e.g.  the  identifier  b’  of  the  corresponding  Main 
block  in  the  tags  of  Var  blocks)  or  by  a  previous  compilation  phase 
(e.g.  offsets)  within  Csharpminor  without  changing  the  actual  con¬ 
tents  of  memory  blocks. 

Specification  of  injection  We  can  now  replace  CompCert  mem¬ 
ory  injection  with  a  stronger  injection  INJ((,  m,  m')  between  a 
source  memory  m  and  a  target  memory  m'  axiomatized  as  follows: 
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•  The  empty  memory  injects  into  itself  with  t  =  0. 

•  If  INJ(i,m, m'),  then  INJ(i  i+l  (h  i->  (b,0),m  i+l  [b],m'  i+l  (h)))  for 
any  allocation  of  a  new  block  b  with  tag  Heap 

•  If  INJ(t,m, m'),  then,  if  the  source  allocates  one  empty  block 
b  with  tag  Stack(Main(/,  iz))  and  several  blocks  corresponding 
to  the  local  variables  of  /  with  tags  Stack(Var(/,  id,,  Jz,-,  o/,-)) 
so  that  [ofj,ofj  +  szi)  are  a  partition  of  [0,  jz),  then  the  target 
memory  allocating  one  block  b  of  size  sz  and  as  many  empty 
blocks  is  related  to  the  resulting  source  memory  by  INJ  with 
(i  i+l  {(b  +  i  (b,  ofj))  :  1  <  i  <  n]). 

•  Load,  store  and  free  operations  are  preserved  with  respect  to  i 

•  If  INJ((,  m,  mi)  and  INJ(i,  ra,  m2),  then  mi  = 

•  If  INJ((i,  mi,  m)  and  INJ(i2,m2,m),  then  (ii,mi)  =  fe,  m2) 

•  The  block  tags  of  the  source  and  target  memories  are  the  same 

Then,  the  memory  transformation  is  defined  as  the  partial  injec¬ 
tive  function  a(m)  =  [m'  :  3r,  INJ(r,  m,  m')).  Then,  we  change  the 
forward  simulation  proof  of  the  CompCert  Csharpminor-to-Cminor 
pass  by  replacing  the  injection  with  INJ,  which  incidentally  proves 
that,  actually,  Csharpminor  makes  the  invariant  dom(a)  hold. 

Implementation  To  realize  those  axioms,  we  use  information 
contained  in  tags  to  first  compute  the  block  mapping  l  from  the 
source  memory  m.  For  any  block  identifier  b,  if  the  tag  of  b  in 
m  is  Stack(Main(. . . )),  then  i(b)  is  undefined;  if  the  tag  of  b  in 
m  is  Stack(Var(/,  id,  fi',  Jz,  o/)),  then  t(fi)  =  {b',of)\  otherwise, 
i(fi)  =  (fi,0). 

Then,  we  must  assume  that  the  memory  m  is  well-formed:  for 
any  block  of  m  of  tag  Stack(Main(. . . )),  this  block  is  empty,  there 
are  no  pointer  to  it  anywhere  in  the  memory,  and  it  is  followed  by 
exactly  the  right  number  of  blocks  of  tag  Stack(Var(/,  id,  b,  sz,  of)) 
corresponding  to  the  local  variables  of  /  and  whose  valid  offsets  are 
located  at  offsets  between  0  and  sz-  This  well-formedness  condition 
is  actually  an  invariant  satisfied  by  the  source  language,  and  it  will 
be  the  domain  of  a. 

From  such  a  memory,  we  can  now  construct  the  target  memory 
m'  from  the  memory  m  as  follows,  by  scanning  it  from  the  first 
block.  Assuming  we  treated  all  blocks  between  1  and  b  -  1,  we 
treat  block  b  and  following  as  follows: 

•  if  is  the  identifier  of  the  next  block  available  for  allocation^, 
then  we  are  done. 

•  Otherwise,  the  block  b  is  well-defined.  If  is  a  heap  block,  then 
copy  its  contents  (transforming  pointers  by  i)  to  the  target  block 
with  the  same  identifier  b,  and  move  to  next  block  b  +  \ 

•  Otherwise,  the  block  b  is  necessarily  of  tag  Stack(Main(/,  Jz)), 
and  is  empty,  and  its  next  blocks  correspond  to  the  local  vari¬ 
ables  of  /  (say  that  there  are  n  of  those).  Then,  in  the  target 
memory  m' ,  b  will  have  size  sz  and  receive  the  contents  (ac¬ 
cordingly  transforming  pointers  by  i)  of  the  following  blocks  of 
m  at  the  offsets  specified  by  their  tags;  but  in  m',  those  blocks 
will  be  left  empty.  Then  move  to  the  next  block  b  -i-  n-i-  1. 

Contrary  to  CompCert  memory  injections,  there  are  no  additional 
memory  locations  in  the  target  that  do  not  correspond  to  any  source 
memory  locations.  This  is  enabled  by  the  fact  that  we  also  add 
alignment  constraints  along  with  block  tags  to  prevent  alignment 
padding.  For  the  sake  of  brevity,  we  do  not  explain  this  issue  here. 


^  A  memory  always  has  finitely  many  blocks,  and  the  number  of  blocks 
always  increases  because  freed  locations  are  never  reused,  so  that  a  freed 
block  is  never  actually  deleted  (only  its  locations  are  turned  into  unusable 
ones)  and  any  newly  allocated  block  is  always  fresh. 


8.  Related  work  and  conclusions 

Our  compositional  semantics  is  designed  primarily  for  C-like  lan¬ 
guages,  so  it  is  not  directly  applicable  to  ML-like  functional  lan¬ 
guages  which  have  more  sophisticated  semantic  models.  C-like  lan¬ 
guages  support  first-class  function  pointers,  but  they  do  not  allow 
function  terms  (e.g.,  Ax.e)  as  first-class  values.  C-like  languages 
also  support  intensional  operations  such  as  equality  test  on  func¬ 
tion  pointers,  so  it  is  unsound  to  replace  one  function  pointer  with 
another  even  if  they  point  to  functions  with  same  observable  be¬ 
haviors.  This  allows  us  to  use  much  simpler  semantic  objects  (e.g., 
memory  blocks  with  code  pointers  as  in  CompCert  [13])  than  so¬ 
phisticated  models  developed  for  functional  languages  [2,  10,  1]. 

Compositional  tracelgame  semantics  Our  idea  of  modeling  the 
behavior  of  each  external  function  call  as  an  Extcall(/,ra,ra') 
event  (see  Sec.  4)  resembles  similar  treatments  in  compositional 
trace  or  game  semantics  [4,  8].  Brookes’s  transition-trace  seman¬ 
tics  [4]  models  environment  transitions  for  shared  memory  con¬ 
current  languages.  Under  Brookes’s  semantics,  a  thread’s  behavior 
is  described  as  a  set  of  transition  traces,  with  each  consisting  of 
a  sequence  of  state  transition  steps  (mi,m[)  ::  {m2,m’f)  ::  ...  :: 
(m„,m'^).  The  gaps  between  consecutive  steps  (e.g.,  mj  and  m2,  or 

j  and  m„)  signal  those  state  transitions  made  by  other  threads 
in  the  environment.  Composing  two  threads  involves  calculating  all 
the  interleavings  of  pairs  of  transition  traces  (one  from  each  thread) 
and  their  stuttering  and  mumbling  closures. 

Our  Ext  cal !(/,  m,  m')  event  also  uses  a  pair  of  memory  states 
(m,  m')  to  signal  state  transitions  made  by  the  environment  (i.e.,  ex¬ 
ternal  calls).  Our  semantic  linking  operation  (see  Sec.  5)  also  does 
the  “merging”  of  multiple  event  traces,  but  it  requires  more  sophis¬ 
ticated  substitutions  (on  behaviors)  since  we  must  also  support  di¬ 
vergence,  I/O  events,  and  reacting  behaviors.  It  does  not  require 
stuttering  and  mumbling  closure  since  we  are  only  dealing  with  se¬ 
quential  languages.  The  proximity  between  these  two  approaches 
shows  great  promise  toward  combining  these  two  techniques  to 
build  compositional  models  for  concurrent  C-like  languages. 

Ghica  and  Tzevelekos  [8]  developed  a  system-level  semantics 
for  composing  C-like  program  modules.  They  also  used  external 
call  and  return  events  and  used  them  to  model  open  C-like  modules 
and  their  environments.  Our  work  can  be  viewed  as  an  adaptation 
of  their  idea  to  the  setting  of  compositional  compiler  correctness, 
with  the  goal  of  addressing  language-independent  behavior  speci¬ 
fications  that  include  divergence,  I/O  and  reactive  events. 

Compositional  CompCert  Concurrently  with  our  work,  Stewart 
et  al  [19,  3]  have  recently  completed  the  development  of  a  for¬ 
mally  verified  separate  compiler  for  CompCert  C.  This  is  a  very 
impressive  achievement  since  their  Coq  implementation  includes 
all  8  translation  phases  from  CompCert  Clight  to  CompCert  x86 
plus  many  of  the  optimization  phases.  They  developed  interaction 
semantics  which  is  a  protocol-oriented  operational  semantics  of  in¬ 
termodule  (or  thread)  interaction:  an  open  module  would  take  nor¬ 
mal  unobservable  steps  or  make  internal  function  calls  (defined  in 
the  same  module),  but  would  “block”  when  calling  external  func¬ 
tions;  each  such  “block”  point  is  considered  as  an  interaction  point; 
the  program  will  resume  execution  when  the  external  function  call 
returns.  To  support  both  vertical  and  horizontal  composition,  they 
have  also  developed  a  new  form  of  “structured  simulations”  which 
extends  CompCert-style  memory  injections  with  fine-grained  sub¬ 
jective  invariants  and  a  leakage  protocol. 

While  our  Extcall-event-based  semantics  (EES)  shares  many 
similarities  to  Stewart  et  a/’s  interaction  semantics  (IS),  they  also 
have  some  significant  differences.  EES  does  not  rely  on  any  new 
“protocol-oriented”  operational  semantics,  instead,  it  just  treats  ex¬ 
ternal  function  calls  as  regular  events,  thus  it  can  use  the  same 
trace-based  behavior  specifications  as  semantic  objects.  When  link- 
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ing  two  modules  Uo  and  Ui,  our  semantic  linking  operator  x  (under 
EES)  would  automatically  calculate  the  resulting  semantic  objects 
for  the  linked  module  (uo  W  Ui ),  replacing  all  cross-module  calls  be¬ 
tween  Uo  and  Ui  with  their  corresponding  behavior  specifications. 
This  leads  to  a  very  nice  linking  theorem  (see  Theorem  1  in  Sec.  5): 
if  Uq  and  Ui  are  two  modules  in  the  same  language,  linking  their 
compositional  semantics  at  the  level  of  their  behaviors  exactly  cor¬ 
responds  to  the  compositional  semantics  of  their  syntactic  concate¬ 
nation  of  the  two  modules.  The  interaction  semantics  (IS),  on  the 
other  hand,  does  not  attempt  to  “big-step”  the  cross-module  calls 
between  Uo  and  Ui  during  linking,  thus  it  has  not  been  able  to  prove 
the  same  linking  theorem  as  we  have  done. 

Kripke  logical  relations  Kripke  Logical  Relations  (KLRs)  [17] 
are  designed  to  support  horizontal  composition  for  functional  lan¬ 
guages.  They  define  equivalence  between  terms  (and  values)  in 
such  a  way  that  two  functions  /i  and  /2  (of  same  type)  are  equiva¬ 
lent  if,  and  only  if,  for  any  two  equivalent  values  vj ,  V2  of  the  same 
type,  (/i  V])  and  (/2  V2)  are  equivalent.  Ahmed  et  al  [1,  7]  showed 
how  to  generalize  KLRs  to  reason  about  higher-order  states.  Hur 
and  Dreyer  [9]  rely  on  step-indexed  logical  relations  to  show  how 
to  support  horizontal  composition;  they  prove  correctness  of  a  one- 
pass  compiler  but  they  do  not  support  vertical  composition  since 
step-indexed  logical  relations  are  known  to  be  not  transitive. 

C-like  languages  support  both  first-class  function  pointers  and 
states  but  they  do  not  support  first-class  function  terms  as  in  most 
functional  languages.  Because  C  function  pointers  can  be  tested  for 
equality,  a  function  pointer  can  not  be  replaced  by  another,  even  if 
they  point  to  functions  that  have  same  observable  behaviors.  This  is 
why  we  can  build  much  simpler  semantic  models  and  how  our  new 
compositional  semantics  can  still  establish  the  monotonicity  (con¬ 
gruence)  result  of  our  refinement  relation  (Section  6,  Theorem  2). 

Parametric  bisimulations  Hur  et  al.  [10]  recently  proposed  a 
promising  approach  that  combines  KLRs  with  bisimulations.  The 
main  idea  is  to  abandon  step-indexing  but  rely,  instead,  on  coinduc- 
tive  simulation-based  techniques  (which  are  closer  to  CompCert- 
style  simulation  relations).  More  specifically,  they  propose  to  pa¬ 
rameterize  the  local  knowledge  of  functions  with  the  global  knowl¬ 
edge  of  external  functions,  and  to  define  equivalence  for  open  mod¬ 
ules  based  on  a  simulation  diagram  over  the  small-step  semantics 
of  the  two  underlying  languages  of  the  programs.  A  simulation  di¬ 
agram  can  make  two  equivalent  programs  perform  several  steps 
from  two  equivalent  states  to  two  states  corresponding  to  an  ex¬ 
ternal  function  call,  then  resume  simulation  upon  return  of  such  a 
call.  This  “disruption”  in  the  flow  of  the  simulation  is  analogous  to 
our  way  of  making  the  external  function  call  explicit  as  a  specific 
event  in  the  behavior.  Thus,  our  work  can  be  seen  as  a  unary  ver¬ 
sion  of  their  parametric  bisimulations  by  defining  a  unary  seman- 
fics  for  open  modules  but  at  the  level  of  behaviors  (independently 
of  the  small-step  semantics  of  the  underlying  languages).  Our  way 
of  defining  the  linking  operator  at  the  semantic  level  of  behaviors 
avoids  the  need  of  strong  typing,  which  makes  our  approach  more 
amenable  to  support  weakly  typed  C-like  languages. 

Conclusions  In  this  paper,  we  have  presented  a  novel  composi¬ 
tional  semantics  for  reasoning  about  open  modules  and  for  sup¬ 
porting  verified  separafe  compilafion  and  linking.  To  build  compo- 
sifional  semanfics  for  open  concurrent  programs,  we  plan  to  split 
our  single  Ext  call  event  into  separate  call  and  return  events.  Se¬ 
mantics  for  open  concurrent  programs  can  then  have  interleaving 
external  call  and  return  events.  Semantic  substitutions  in  our  link¬ 
ing  will  be  replaced  by  some  form  of  “zipping”  operations. 
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Abstract 

Modern  computer  systems  consist  of  a  multitude  of  abstraction  lay¬ 
ers  (e.g.,  OS  kernels,  hypervisors,  device  drivers,  network  protocols), 
each  of  which  defines  an  interface  that  hides  the  implementation 
details  of  a  particular  set  of  functionality.  Client  programs  built  on 
top  of  each  layer  can  be  understood  solely  based  on  the  interface, 
independent  of  the  layer  implementation.  Despite  their  obvious  im¬ 
portance,  abstraction  layers  have  mostly  been  treated  as  a  system 
concept;  they  have  almost  never  been  formally  specified  or  verified. 
This  makes  it  difficult  to  establish  strong  comectness  properties,  and 
to  scale  program  verification  across  multiple  layers. 

In  this  paper,  we  present  a  novel  language-based  account  of 
abstraction  layers  and  show  that  they  correspond  to  a  strong  form 
of  abstraction  over  a  particularly  rich  class  of  specifications  which 
we  call  deep  specifications.  Just  as  data  abstraction  in  typed  func¬ 
tional  languages  leads  to  the  important  representation  independence 
property,  abstraction  over  deep  specification  is  characterized  by  an 
important  implementation  independence  property:  any  two  imple¬ 
mentations  of  the  same  deep  specification  must  have  contextually 
equivalent  behaviors.  We  present  a  new  layer  calculus  showing 
how  to  formally  specify,  program,  verify,  and  compose  abstraction 
layers.  We  show  how  to  instantiate  the  layer  calculus  in  realistic 
programming  languages  such  as  C  and  assembly,  and  how  to  adapt 
the  CompCert  verified  compiler  to  compile  certified  C  layers  such 
that  they  can  be  linked  with  assembly  layers.  Using  these  new  lan¬ 
guages  and  tools,  we  have  successfully  developed  multiple  certified 
OS  kernels  in  the  Coq  proof  assistant,  the  most  realistic  of  which 
consists  of  37  abstraction  layers,  took  less  than  one  person  year  to 
develop,  and  can  boot  a  version  of  Linux  as  a  guest. 

Categories  and  Subject  Descriptors  D.2.4  [Software  Engineer¬ 
ing]:  Software/Program  Verification — Correctness  proofs,  formal 
methods;  D.3.3  [Programming  Languages]:  Languages  Constructs 
and  Features;  D.3.4  [Programming  Languages]:  Processors — 
Compilers;  D.4.5  [Operating  Systems]:  KAisbiVAy — Verification; 
D.4.7  [Operating  Systems]:  Organization  and  Design — Hierarchical 
design;  F.3.1  [Logics  and  Meanings  of  Programs]:  Specifying  and 
Verifying  and  Reasoning  about  Programs 

Keywords  Abstraction  Layer;  Modularity;  Deep  Specification; 
Program  Verification;  Certified  OS  Kernels;  Certified  Compilers. 
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1.  Introduction 

Modem  hardware  and  software  systems  are  constructed  using  a 
series  of  abstraction  layers  (e.g.,  circuits,  microarchitecture,  ISA 
architecture,  device  drivers,  OS  kernels,  hypervisors,  network  proto¬ 
cols,  web  servers,  and  application  APIs),  each  defining  an  interface 
that  hides  the  implementation  details  of  a  particular  set  of  function¬ 
ality.  Client  programs  built  on  top  of  each  layer  can  be  understood 
solely  based  on  the  interface,  independent  of  the  layer  implementa¬ 
tion.  Two  layer  implementations  of  the  same  interface  should  behave 
in  the  same  way  in  the  context  of  any  client  code. 

The  power  of  abstraction  layers  lies  in  their  use  of  a  very  rich 
class  of  specifications,  which  we  will  call  deep  specifications  in  this 
paper.  A  deep  specification,  in  theory,  is  supposed  to  capture  the 
precise  functionality  of  the  underlying  implementation  as  well  as  the 
assumptions  which  the  implementation  might  have  about  its  client 
contexts.  In  practice,  abstraction  layers  are  almost  never  formally 
specified  or  verified;  their  interfaces  are  often  only  documented 
in  natural  languages,  and  thus  cannot  be  rigorously  checked  or 
enforced.  Nevertheless,  even  such  informal  instances  of  abstraction 
over  deep  specifications  have  already  brought  us  huge  benefits. 
Baldwin  and  Clark  (jj  attributed  such  use  of  abstraction,  modularity, 
and  layering  as  the  key  factor  that  drove  the  computer  industry 
toward  today’s  explosive  levels  of  innovation  and  growth  because 
complex  products  can  be  built  from  smaller  subsystems  that  can  be 
designed  independently  yet  function  together  as  a  whole. 

Abstraction  and  modularity  have  also  been  heavily  studied  in 
the  programming  language  community  I31II30I.  The  focus  there  is 
on  abstraction  over  “shallow”  specifications.  A  module  interface 
in  existing  languages  cannot  describe  the  full  functionality  of  its 
underlying  implementation,  instead,  it  only  describes  type  specifi¬ 
cations,  augmented  sometimes  with  simple  invariants.  Abstraction 
over  shallow  specifications  is  highly  desirable  (H,  but  client  pro¬ 
grams  cannot  be  understood  from  the  interface  alone — this  makes 
modular  verification  of  correctness  properties  impossible:  verifica¬ 
tion  of  client  programs  must  look  beyond  the  interface  and  examine 
its  underlying  implementation,  thus  breaking  the  modularity. 

Given  the  obvious  importance,  formalizing  and  verifying  abstrac¬ 
tion  layers  are  highly  desirable,  but  they  pose  many  challenges: 

•  Lack  of  a  language-based  model.  It  is  unclear  how  to  model 
abstraction  layers  in  a  language-based  setting  and  how  they 
differ  from  regular  software  modules  or  components.  Each  layer 
seems  to  be  defining  a  new  “abstract  machine;”  it  may  take 
an  existing  set  of  mechanisms  (e.g.,  states  and  functions)  at  the 
layer  below  and  expose  a  different  view  of  the  same  mechanisms. 
For  example,  a  virtual  memory  management  layer — built  on  top 
of  a  physical  memory  layer —  would  expose  to  clients  a  different 
view  of  the  memory,  now  accessed  through  virtual  addresses. 

•  Lack  of  good  language  support.  Programming  an  abstraction 
layer  formally,  by  its  very  nature,  would  require  two  languages: 
one  for  writing  the  layer  implementation  (which,  given  the  low- 
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level  nature  of  many  layers,  often  means  a  language  like  C  or 
assembly);  another  for  writing  the  formal  layer  specification 
(which,  given  the  need  to  precisely  specify  full  functionality, 
often  means  a  rich  formal  logic).  It  is  unclear  how  to  fit  these 
two  different  languages  into  a  single  setting.  Indeed,  many 
existing  formal  specification  languages  l34lll8l[T^  are  capable 
of  building  accurate  models  with  rich  specifications,  but  they  are 
not  concerned  with  connecting  to  the  actual  running  code. 

•  Lack  of  compiler  and  linking  support.  Abstraction  layers  are 
often  deployed  in  binary  or  assembly.  Even  if  we  can  verify  a 
layer  implementation  written  in  C,  it  is  unclear  how  to  compile 
it  into  assembly  and  link  it  with  other  assembly  layers.  The 
CompCert  verified  compiler  (B  can  only  prove  the  correctness 
of  compilation  for  whole  programs,  not  individual  modules  or 
layers.  Linking  C  with  assembly  adds  a  new  challenge  since  they 
may  have  different  memory  layouts  and  calling  conventions. 

In  this  paper,  we  present  a  formal  study  of  abstraction  layers  that 
tackles  all  these  challenges.  We  define  a  certified  abstraction  layer 
as  a  triple  (Li ,  M,  L2)  plus  a  mechanized  proof  object  showing  that 
the  layer  implementation  M,  built  on  top  of  the  interface  Li  (the 
underlay),  indeed  faithfully  implements  the  desirable  interface  L2 
above  (the  overlay).  Here,  the  implements  relation  is  often  defined 
as  some  simulation  relation  (22).  A  certified  layer  can  be  viewed 
as  a  “parameterized  module”  (from  interfaces  L\  to  L2),  a  la  an 
SML  functor  I23l;  but  it  enforces  a  stronger  contextual  correctness 
property:  a  correct  layer  is  like  a  “certified  compiler,”  capable  of 
converting  any  safe  client  program  running  on  top  of  L2  into  one 
that  has  the  same  behavior  but  runs  on  top  of  Li  (e.g.,  by  “compiling” 
abstract  primitives  in  L2  into  their  implementation  in  M). 

A  regular  software  module  M  (built  on  top  of  Lf)  with  interface 
1/2  may  not  enjoy  such  a  property  because  its  client  may  invoke 
another  module  M'  which  shares  some  states  with  M  but  imposes 
different  state  invariants  from  those  assumed  by  L2.  An  abstraction 
layer  does  not  allow  such  a  client,  instead,  such  M'  must  be  either 
built  on  top  of  L2  (thus  respecting  the  invariants  in  L2),  or  below 
L2  (in  which  case,  L2  itself  must  be  changed). 

Our  paper  makes  the  following  new  contributions: 

•  We  present  the  first  language-based  account  of  certified  abstrac¬ 
tion  layers  and  show  how  they  correspond  to  a  rigorous  form 
of  abstraction  over  deep  specifications  used  widely  in  the  sys¬ 
tem  community.  A  certified  layer  interface  describes  not  only 
the  precise  functionality  of  any  underlying  implementation  but 
also  clear  assumptions  about  its  client  contexts.  Abstraction  over 
deep  specifications  leads  to  the  powerful  implementation  inde¬ 
pendence  property  (see  Sec.|^:  any  two  implementations  of  the 
same  layer  interface  have  contextually  equivalent  behaviors. 

•  We  present  a  new  layer  calculus  showing  how  to  formally  specify, 
program,  verify,  and  compose  certified  abstraction  layers  (see 
Sec.[^.  Such  a  layer  language  plays  a  similar  role  as  the  module 
language  in  SML  (23,  but  its  interface  checking  is  not  just 
typechecking  or  signature  matching;  instead,  it  requires  formal 
verification  of  the  implements  relation  in  a  proof  assistant. 

•  We  have  instantiated  the  layer  calculus  on  top  of  two  core  lan¬ 
guages  (see  Sec.ffland  ClightX,  a  variant  of  the  CompCert 
Clight  language  (3;  and  LAsm,  an  x86  assembly  language.  Both 
ClightX  and  LAsm  can  be  used  to  program  certified  abstraction 
layers.  We  use  the  Coq  logic  (35)  to  develop  all  the  layer  inter¬ 
faces.  Each  ClightX  or  LAsm  layer  is  parameterized  over  its 
underlay  interface,  implemented  using  CompCert’s  external  call 
mechanisms.  We  developed  new  tools  and  tactic  libraries  to  help 
automate  the  verification  of  the  implements  relation. 

•  We  have  also  modified  CompCert  to  build  a  new  verified  com¬ 
piler,  CompCertX,  that  can  compile  ClightX  abstraction  layers 


into  LAsm  layers  (see  Sec.|^.  CompCertX  is  novel  because  it 
can  prove  a  stronger  correctness  theorem  for  compiling  individ¬ 
ual  functions  in  each  layer — such  a  theorem  requires  reasoning 
about  memory  injection  HD  between  the  memory  states  of  the 
source  and  target  languages.  To  support  linking  between  ClightX 
and  LAsm  layers,  we  show  how  to  design  the  implements  rela¬ 
tion  so  that  it  is  stable  over  memory  injection. 

•  Using  these  new  languages  and  tools,  we  have  successfully 
constructed  several  feature-rich  certified  OS  kernels  in  Coq  (see 
Sec.|^.  A  certified  kernel  (Lxse,  K,  Lker)  is  a  verified  LAsm 
implementation  K,  built  on  top  of  and  it  implements  the 
set  of  system  calls  as  specified  in  Lfeer.  The  correctness  of  the 
kernel  guarantees  that  if  a  user  program  P  runs  safely  on  top 
of  Z/feer-,  running  the  version  of  P  linked  with  the  kernel  K  on 
LxSQ  will  produce  the  same  behavior.  All  our  certified  kernels 
are  built  by  composing  a  collection  of  smaller  layers.  The  most 
realistic  kernel  consists  of  37  layers,  took  less  than  one  person 
year  to  develop,  and  can  boot  a  version  of  Linux  as  a  guest. 

The  POPL  Artifact  Evaluation  Committee  reviewed  the  full  artifact 
of  our  entire  effort,  including  ClightX  and  LAsm,  the  CompCertX 
compiler,  and  the  implementation  of  all  certified  kernels  with  Coq 
proofs.  The  reviewers  unanimously  stated  that  our  implementation 
exceeded  their  expectations.  Additional  details  about  our  work  can 
be  found  in  the  companion  technical  report  HU. 

2.  Why  abstraction  layers? 

In  this  section,  we  describe  the  main  ideas  behind  deep  specifications 
and  show  why  they  work  more  naturally  with  abstraction  layers  than 
with  regular  software  modules. 

2.1  Shallow  vs.  deep  specifications 

We  introduce  shallow  and  deep  specifications  to  describe  different 
classes  of  requirements  on  software  and  hardware  components. 
Type  information  and  program  contracts  are  examples  of  “shallow” 
specifications.  Type-based  module  interfaces  (e.g.,  ML  signatures) 
are  introduced  to  support  compositional  static  type  checking  and 
separate  compilation:  a  module  M  can  be  typechecked  based  on  its 
import  interface  Li  (without  looking  at  Li’s  implementation),  and 
shown  to  have  types  specified  in  its  export  interface  L2. 

To  support  compositional  verification  of  strong  functional  cor¬ 
rectness  properties  on  a  large  system,  we  would  hope  that  all  of  its 
components  are  given  “deep”  specifications.  A  module  M  will  be 
verified  based  on  its  import  interface  L\  (without  looking  at  Li’s 
implementation),  and  shown  to  implement  its  export  interface  L2. 

To  achieve  true  modularity,  we  would  like  to  reason  about 
the  behaviors  of  M  solely  based  on  its  import  interface  Li;  and 
we  would  also  like  its  export  interface  L2  to  describe  the  full 
functionality  of  M  while  omitting  the  implementation  details. 

More  formally,  a  deep  specification  captures  everything  we 
want  to  know  about  any  of  its  implementations — it  must  satisfy 
the  following  important  “implementation  independence”  property: 

Implementation  independence:  Any  two  implementations 
(e.g.,  Ml  and  M2)  of  the  same  deep  specification  (e.g.,  L) 
should  have  contextually  equivalent  behaviors. 

Different  languages  may  define  such  contextual  equivalence  relation 
differently,  but  regardless,  we  want  that,  given  any  whole-program 
client  P  built  on  top  of  L,  running  P0  Mi  (i.e.,  P  linked  with  Mi) 
should  lead  to  the  same  observable  result  as  running  P  0  M2. 

Without  implementation  independence,  running  P  0  Mi  and 
P  0  M2  may  yield  different  observable  results,  so  we  can  prove  a 
specific  whole-program  property  that  holds  on  P  0  Mi  but  not  on 
P  0  M2 — such  whole-program  property  cannot  be  proved  based  on 
the  program  P  and  the  specification  L  alone. 
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typedef  enum  { 

TD_READY,  TD_RUN, 
TD.SLEEP,  TD.DEAD 
}  td_state; 

struct  tcb  { 
td_state  tds; 
struct  tcb  *prev,  *next; 

>; 

struct  tdq  { 

struct  tcb  *head,  ♦tail; 

>: 

/ /  ^tcbp  Snd  I^tdqp 
Struct  tcb  tcbp[64]; 
struct  tdq  tdqp[64] ; 

/  /  ^dequeue 
Struct  tcb  * 
dequeue (struct  tdq  *q){ 
struct  tcb  *head, *next ; 
struct  tcb  *pid=null; 
if Cq  ==  null) 
return  pid; 
else  { 

head  =  q  ->  head; 
if  (head  ==  null) 
return  pid; 
else  { 
pid  =  head; 
next  =  head  ->  next; 
if (next  ==  null)  { 
q  ->  head  =  null; 
q  ->  tail  =  null; 
y  else  { 

next  ->  prev  =  null; 
q  ->  head  =  next ; 

> 

} 

> 

return  pid; 

>  ... 


Inductive  td_state  := 
I  TD.READY  I  TD_RUN 
I  TD.SLEEP  1  TD.DEAD. 


Inductive  tcb  := 

I  TCBUndef 

I  TCBV  (tds:  td_state) 

(prev  next:  Z) 

Inductive  tdq  := 

I  TDQUndef 

I  TDQV  (head  tail:  Z) 

Record  abs : ={tcbp: ZMap . t  tcb; 

tdqp:ZMap.t  tdq} 

Function  d’dequeue  ^  ^  =  = 
match  (a.tdqp  i)  with 
I TDQUndef  =>  None 
I TDQV  h  t  => 
if  zeq  h  0  then 
Some  (a,  0) 
else 

match  a.tcbp  h  with 
I TCBUndef  =>  None 
I TCBV  _  _  n  => 
if  zeq  n  0  then 
let  q’ :=(TDQV  0  0)  in 
Some  (set_tdq  a  i  q’ ,  h) 
else 

match  a.tcbp  n  with 
I TCBUndef  =>  None 
ITCBV  s’  _  n’  => 
let  q’:=(TDQV  n  t)  in 
let  a’:=set_tdq  a  i  q’  in 
let  b:=(TCBV  s’  0  n’)  in 
Some  (set_tcb  a’  n  b,  h) 
end 
end 

end  . . . 


Figure  1.  Concrete  (in  C)  vs.  abstract  (in  Coq)  thread  queues 


Definition  tcb  :=  td_state. 

Definition  tdq  :=  List  Z. 

Record  abs ’: ={tcbp : ZMap .t  tcb; 

tdqp:ZMap.t  tdq} 


Function  a  i  :  = 

match  (a.tdqp  i)  with 
1  h  :  :  q’  => 

Some(set_tdq  a  i  q’ ,  h) 
I  nil  =>  None 
end  . 


Figure  2.  A  more  abstract  queue  (in  Coq) 

Hoare-style  partial  correctness  specifications  are  rarely  deep 
specifications  since  they  fail  to  satisfy  implementation  independence. 
Given  two  implementations  of  a  partial  correctness  specification  for 
a  factorial  function,  one  can  return  the  correct  factorial  number  and 
another  can  just  go  into  infinite  loop.  A  program  built  on  top  of  such 
specification  may  not  be  reasoned  about  based  on  the  specification 
alone,  instead,  we  have  to  peek  into  the  actual  implementation  in 
order  to  prove  certain  properties  (e.g.,  termination). 

In  the  rest  of  this  paper,  following  CompCert  1201.  we  will  focus 
on  languages  whose  semantics  are  deterministic  relative  to  external 
events  (formally,  these  languages  are  defined  as  both  receptive 
and  determinate  (33l  and  they  support  external  nondeterminism 
such  as  I/O  and  concurrency  by  making  events  explicit  in  the 
execution  traces).  Likewise,  we  only  consider  interfaces  whose 
primitives  have  deterministic  specifications.  If  L  is  a  deterministic 
interface,  and  both  Mi  and  M2  implement  L,  then  P  0  Mi  and 
P  0  M2  should  have  identical  behaviors  since  they  both  follow  the 
semantics  of  running  P  over  L,  which  is  deterministic.  Deterministic 
specifications  are  thus  also  deep  specifications. 

Deep  specifications  can,  of  course,  also  be  nondeterministic. 
They  may  contain  resource  bounds  ||6),  numerical  uncertainties  (71. 


Figure  3.  Client  code  with  conflicting  abstract  states? 

etc.  Such  nondeterminism  should  be  unobservable  in  the  semantics 
of  a  whole  program,  allowing  implementation  independence  to 
still  hold.  We  leave  the  investigation  of  nondeterministic  deep 
specifications  as  future  work. 

2.2  Layers  vs.  modules 

When  a  module  (or  a  software  component)  implements  an  interface 
with  a  shallow  specification,  we  often  hide  its  private  memory  state 
completely  from  its  client  code.  In  doing  so,  we  can  guarantee 
that  the  client  cannot  possibly  break  any  invariants  imposed  on  the 
private  state  in  the  module  implementation. 

If  a  module  implements  an  interface  with  a  deep  specification,  we 
would  still  hide  the  private  memory  state  from  its  client,  but  we  also 
need  to  introduce  an  abstract  state  to  specify  the  full  functionality 
of  each  primitive  in  the  interface. 

For  example,  Fig.  [T]  shows  the  implementation  of  a  concrete 
thread  queue  module  (in  C)  and  its  interface  with  a  deep  specification 
(in  Coq).  The  local  state  of  the  C  implementation  consists  of  64 
thread  queues  (tdqp)  and  64  thread  control  blocks  (tcbp).  Each 
thread  control  block  consists  of  the  thread  state,  and  a  pair  of  pointers 
(prev  and  next)  indicating  which  linked-list  queue  it  belongs  to.  The 
dequeue  function  takes  a  pointer  to  a  queue;  it  returns  the  head 
block  if  the  queue  is  not  empty,  or  null  if  the  queue  is  empty. 

In  the  Coq  specification  (Fig. [fright;  we  omitted  some  invariants 
to  make  it  more  readable),  we  introduce  an  abstract  state  of  type 
abs  where  we  represent  each  C  array  as  a  Coq  finite  map  (ZMap.t), 
and  each  pointer  as  an  integer  index  (Z)  to  the  tdq  or  tcb  array. 
The  dequeue  primitive  iTdequeue  is  a  mathematical  function  of  type 
abs  — >  Z  ^  option  (abs  x  Z);  when  the  function  returns  None,  it 
means  that  the  abstract  primitive  faults.  This  dequeue  specification 
is  intentionally  made  very  similar  to  the  C  function,  so  we  can  easily 
show  that  the  C  module  indeed  implements  the  specification. 

We  define  that  a  module  implements  a  specification  if  there 
is  a  forward  simulation  HU  from  the  module  implementation 
to  its  specification.  In  the  context  of  determinate  and  receptive 
languages  1331 1201.  if  the  specification  is  also  deterministic,  it  is 
sufficient  to  find  a  forward  simulation  from  the  specification  to  its 
implementation  (this  is  often  easier  to  prove  in  practice). 

In  the  rest  of  this  paper,  following  CompCert,  we  often  call  the 
forward  simulation  from  the  implementation  to  its  specification  as 
upward  (forward)  simulation  and  the  one  from  the  specification  to 
its  implementation  as  downward  (forward)  simulation. 

Fig.  0  shows  a  more  abstract  specification  of  the  same  queue 
implementation  where  the  new  abstract  state  abs’  omits  the  prev 
and  next  links  in  tcb  and  treats  each  queue  simply  as  a  Coq  list.  The 
dequeue  specification  is  now  even  simpler,  which  makes  it 

easier  to  reason  about  its  client,  but  it  is  now  harder  to  prove  that  the 
C  module  implements  this  more  abstract  specification.  This  explains 
why  we  often  introduce  less  abstract  specifications  (e.g.,  the  one 
in  Fig.  [TJ  as  intermediate  steps,  so  a  complex  abstraction  can  be 
decomposed  into  several  more  tractable  abstraction  steps. 

Deep  specification  brings  out  an  interesting  new  challenge 
shown  in  Fig.[U  what  if  a  program  P  attempts  to  call  primitives 
defined  in  two  mfferent  interfaces  Li  and  L2,  which  may  export  two 
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conflicting  views  (i.e.,  abstract  states  absl  and  abs2)  of  the  same 
abstract  state  abs  ( thus  also  the  same  concrete  memory  state  mem)? 

Here  we  assume  that  modules  M,  Mi ,  M2  implement  interfaces 
L,  Li ,  L2  via  some  simulation  relations  R,  Ri ,  R2  (lines  marked 
with  a  dot  on  one  end)  respectively.  Clearly,  calling  primitives  in  L2 
may  violate  the  invariants  imposed  in  Li,  and  vice  versa,  so  Li  and 
1/2  are  breaking  each  other’s  abstraction  when  we  run  P.  In  fact, 
even  without  M2  and  L2,  if  we  allow  P  to  directly  call  primitives 
in  L,  similar  violation  of  Li  invariants  can  also  occur. 

This  means  that  we  must  prohibit  client  programs  such  as  P 
above,  and  each  deep  specification  must  state  the  clear  assumptions 
about  its  valid  client  contexts.  Each  interface  should  come  with  a 
single  abstract  state  (abs)  used  by  its  primitives;  and  its  client  can 
only  access  the  same  abs  throughout  its  execution. 

This  is  what  abstraction  layers  are  designed  for  and  why  they  are 
more  compositional  (with  respect  to  deep  specification)  than  regular 
modules!  Layers  are  introduced  to  limit  interaction  among  different 
modules:  only  modules  with  identical  state  views  (i.e.,  Ri,  R2  and 
absl,  abs2  must  be  identical)  can  be  composed  horizontally. 

A  layer  interface  seems  to  be  defining  a  new  “abstract  machine” 
because  it  only  supports  client  programs  with  a  particular  view  of  the 
memory  state.  The  correctness  of  a  certified  layer  implementation 
allows  us  to  transfer  formal  reasoning  (of  client  programs)  on  one 
abstract  machine  (the  overlay)  to  another  (the  underlay). 

Programming  with  certified  abstraction  layers  enables  a  dis¬ 
ciplined  way  of  composing  a  large  number  of  components  in  a 
complex  system.  Without  using  layers,  we  may  have  to  consider 
arbitrary  module  interaction  or  dependencies:  an  invariant  held  in 
one  function  can  be  easily  broken  when  it  calls  a  function  defined 
in  another  module.  A  layered  approach  aims  to  sort  and  isolate  all 
components  based  on  a  carefully  designed  set  of  abstraction  levels 
so  we  can  reason  about  one  small  abstraction  step  at  a  time  and 
eliminate  most  unwanted  interaction  and  dependencies. 

3.  A  calculus  of  abstraction  layers 

Motivation  A  user  of  an  abstraction  layer  (Li,  M,  L2)  wants  to 
know  that  its  implementation  M  (on  top  of  the  underlay  interface 
Li)  can  be  used  to  run  any  program  P  written  against  the  overlay 
interface  L2.  If  we  consider  Li,  L2  as  abstract  machines  and  M 
as  a  program  transformation  (which  transforms  a  program  P  into 
M (P)),  then  for  some  notion  of  refinement  c,  this  property  can  be 
stated  as  VP  .  M{P)@Li  c  P@L2,  meaning  that  the  behavior  of 
M (P)  executing  on  top  of  the  underlay  specification  Li  refines  that 
of  the  program  P  executing  on  top  of  the  overlay  specification  P2. 

This  view  of  abstraction  layers  captures  a  wide  variety  of 
situations.  Furthermore,  two  layers  (Li,  M,  L2)  and  (L2,  N,  Lf) 
can  be  composed  as  (Li,  M  o  N,  Lf),  and  the  correctness  of  the 
layer  implementation  M  o  N  follows  from  that  of  M  and  N. 

However,  the  layer  interfaces  are  often  not  arbitrary  abstract 
machines,  but  simply  instances  of  a  base  language,  specialized  to 
provide  layer-specific  primitives  and  abstract  state.  The  implementa¬ 
tion  is  not  an  arbitrary  transformation,  but  instead  consists  of  some 
library  code  to  be  linked  with  the  client  program.  In  order  to  prove 
this  transformation  correct,  we  will  verify  the  implementation  of 
each  primitive  separately,  and  then  use  these  proofs  in  conjunction 
with  a  general  template  for  the  instrumented  language. 

Abstract  machines  and  program  transformations  are  too  general 
to  capture  this  redundant  structure.  The  layer  calculus  presented  in 
this  section  provides  fine-grained  notions  of  layer  interfaces  and 
implementations.  It  allows  us  to  describe  what  varies  from  one  layer 
to  the  next  and  to  assemble  such  layers  in  a  generic  way. 

3.1  Prerequisites 

To  keep  the  formalism  general  and  simple,  we  initially  take  the 
syntax  and  behavior  of  the  programs  under  consideration  to  be 


abstract  parameters.  Specifically,  in  the  remainder  of  this  section  we 
will  assume  that  the  following  are  given: 

•  a  set  of  identifiers  i  e  I  which  will  be  used  to  name  variables, 
functions,  and  primitives  (e.g.,  dequeue  and  tcbp  in  Fig.[T](; 

•  sets  of  function  definitions  k  e  K,  and  variable  definitions  u  e  T, 
as  specified  by  the  language  (e.g.,  Kdequeue  and  iztcbp  in  Fig.[TJ; 

•  a  set  of  behaviors  a  e  T,  for  the  individual  primitives  of  layers, 
and  the  individual  functions  of  programs  (e.g.,  the  step  relation 
(Zdeqiieue  derived  from  the  Coq  function  ^dequeue  in  Fig.[^. 

More  examples  can  be  found  in  Sec.|^ 

We  also  need  to  define  how  the  behaviors  refine  one  another. 
This  is  particularly  important  because  our  layer  interfaces  bundle 
primitive  specifications,  and  because  a  relation  between  layer  inter¬ 
faces  is  defined  pointwise  over  these  primitives.  Ultimately,  we  wish 
to  use  these  fine-grained  layers  and  refinements  to  build  complete 
abstract  machines  and  whole-machine  simulations.  This  can  only  be 
done  if  the  refinements  of  individual  primitives  are  consistent;  for 
example,  if  they  are  given  in  terms  of  the  same  simulation  relation. 

Hence,  we  index  behavior  refinement  by  the  elements  of  a  partial 
monoid  (R,o,id).  We  will  refer  to  the  elements  R  e  M  of  this 
monoid  as  simulation  relations.  However,  note  that  at  this  stage,  the 
elements  of  R  are  entirely  abstract,  and  we  require  only  that  the 
composition  operator  o  and  identity  element  id  satisfy  the  monoid 
laws  R  o  (^S  o  T)  =  (R  o  S)  o  T  and  i?  o  id  =  id  o  J?  =  i?. 

Finally,  we  need  to  interpret  these  abstract  simulation  relations  as 
refinement  relations  between  behaviors.  That  is,  for  each  i?  e  R,  we 
require  a  relation  on  E.  For  instance,  if  the  behaviors  01,02  e  ^ 
are  taken  to  be  step  relations  over  some  sets  of  states,  cri  02 
may  be  interpreted  as  the  following  simulation  diagram: 


That  is,  whenever  two  states  si,  S2  are  related  by  R  in  some  sense, 
and  oi  takes  si  to  s'l  in  one  step,  then  there  exists  s'2  such  that  02 
takes  S2  to  s'2  in  zero  or  more  steps,  and  s'2  and  sj  are  also  related 
by  R.  The  relations  should  respect  the  monoid  structure  of  R,  so 
that  for  any  ct  e  E  we  have  o  o,  and  so  that  whenever  i?,  S'  e  R 
and  oi,02,os  e  E  such  that  oi  02  and  02  fzs,  it  should 
be  the  case  that  cri  03. 


3.2  Layer  interfaces  and  modules 

The  syntax  of  the  calculus  is  defined  as  follows: 

L  ::=  0|ii— >i/|Li0L2 
M  :;=  0  I  i  I— >  K  I  i  I— >  1/  I  Mi  0  M2 

The  layer  interfaces  L  and  modules  M  are  essentially  finite  maps; 
constructions  of  the  form  i  1— ►  _  are  elementary  single-binding 
objects,  and  0  computes  the  union  of  two  layers  or  modules.  This 
is  illustrated  by  the  proof-of-concept  interpretation  given  in  the 
companion  technical  report  (m.  For  example,  the  thread  queue 
module,  shown  in  Fig.Q]  can  be  defined  as  Mthread_queue  :=  tcbp 
t'tcbp  0  tdqp  i/tdqp  0  dequeue  i->  Kdequeue,  while  the  overlay 
interface  can  be  defined  as  Lthread^queue  :=  dequeue  i— >  CTdequeue  • 
The  rules  are  presented  in  Fig.|^  The  inclusion  preorder  defined 
on  modules  corresponds  to  the  intuition  that  when  M  c  N, 
any  definition  present  in  M  must  be  present  in  N  as  well.  The 
composition  operator  0  behaves  like  a  join  operator.  However,  while 
M  0  is  an  upper  bound  of  M  and  N,  we  do  not  require  it  to 
be  the  least  upper  bound.  The  order  on  layer  interfaces  extends  the 
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Sem-Var 


Ml  c  M2 

M  M 

MLe-Refl 

0  c  M 

MLe-Empty 

-M'  0  0  c  M 

MLe-Id-Right 

(Ml  ©  M2)  ©  M3  c  Ml  ©  (M2  ©  M3) 
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LLe-Mon 
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■  VAR 
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Li  |— H  ^  ■  L2  L2  |— s  ^  •  ^3 

-  VCOMP 
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Figure  4.  The  fine-grained  layer  calculus 

underlying  simulation  preorder  on  behaviors.  Compared  to  c, 
it  should  satisfy  the  additional  property  LLe-Idempotent. 

The  judgment  L\  hi?  M  :  L2  is  akin  to  a  typing  judgment  for 
modules.  It  asserts  that,  using  the  simulation  relation  R,  the  module 
M — running  on  top  of  Li — faithfully  implements  1/2.  Because 
modules  consist  of  code  ultimately  intended  to  be  linked  with  a  client 
program,  the  empty  module  0  acts  as  a  unit,  and  can  implement  any 
layer  interface  L  (Empty).  Moreover,  appending  first  N ,  then  M  to 
a  client  program  is  akin  to  appending  M  (S  N  in  one  step  (Vcomp). 
These  rules  correspond  to  the  identity  and  composition  properties 
already  present  in  the  framework  of  abstract  machines  and  program 
transformations.  However,  the  fine-grained  calculus  also  provides  a 
way  to  split  refinements  (Hcomp):  when  two  different  layer  interfaces 
are  implemented  in  a  compatible  way  by  two  different  modules  on 
top  of  a  common  underlay  interface,  then  the  union  of  the  two 
modules  implements  the  union  of  the  two  interfaces. 

This  allows  us  to  break  down  the  problem  of  verifying  a  layer 
implementation  in  smaller  pieces,  but  ultimately,  we  need  to  handle 
individual  functions  and  primitives.  The  consequence  rule  (Conseq) 
can  be  used  to  tie  our  notion  of  behavior  refinement  into  the  calculus. 
However,  to  make  the  introduction  of  certified  code  possible,  we 
need  a  semantics  of  the  underlying  language. 


I-]  :  M  ^  (L  ^  L) 
i  ^  u  lild  li  ^  vlL 
|M](L  e  I^IE)  sSid  [M  e  Wli  SEM-COMP 

A^l  C  M‘2  ^  L'2  SEM-MON 


Figure  5.  Semantics  of  modules 

|— I  :  M  ^  L  ^  L.  Given  such  a  function,  we  can  interpret  the 
typing  judgment  as: 

Li  V-R  M  ■.  L2  f/2  L\  0  [M]Li. 

Then  the  properties  in  Fig.|^are  sufficient  to  ensure  the  soundness 
of  the  typing  rules  with  respect  to  this  interpretation. 

Here,  surprisingly,  we  require  that  the  specification  refine  the 
implementation!  This  is  because  our  proof  technique  involves 
turning  such  a  downward  simulation  into  the  converse  upward 
simulation,  as  detailed  in  Sec. (Theorem [TJ  and  Sec.|4.3|  Also,  we 
included  Li  on  the  right-hand  side  of  to  support  pass-through 
of  primitives  in  the  underlay  Li  into  the  overlay  L2. 

The  property  Sem-Comp  can  be  understood  intuitively  as  follows. 
In  |[M]|(Z/  ©  |A''|I/),  the  code  of  M  is  able  to  use  the  functions 
defined  in  N  in  addition  to  the  primitives  of  the  underlay  interface 

L,  but  conversely  the  code  of  N  cannot  access  the  functions  of 

M.  However,  in  \M  ®  AIL,  the  functions  of  M  and  N  can  call 
each  other  freely,  and  therefore  the  result  should  be  more  defined. 
The  property  Sem-Mon  states  that  making  the  module  and  underlay 
larger  should  also  result  in  a  more  defined  semantics. 

Once  a  language  semantics  is  given,  we  introduce  a  language- 
specific  rule  to  prove  the  correctness  of  individual  functions: 

VC(L,  K,  cr) 

— - ^ ^ - Fun 

L  Hid  ^  E-p  K  :  2  E- p  cr 

where  the  language-specific  predicate  VC(L,  k,  cr)  asserts  that  the 
function  body  n  faithfully  implements  the  primitive  behavior  o  on 
top  of  L.  This  rule  can  be  combined  with  the  rules  of  the  calculus  to 
build  up  complete  certified  layer  implementations. 

Similarly,  given  a  concrete  language  semantics,  we  will  want  to 
tie  the  calculus  back  into  the  framework  of  abstract  machines  and 
program  transformations.  For  a  layer  interface  L,  we  will  define  a 
corresponding  abstract  machine  meant  to  execute  programs  written 
in  a  version  of  the  language  augmented  with  the  primitives  specified 
in  L.  The  program  transformation  associated  with  a  module  M  will 
simply  concatenate  the  code  of  M  to  the  client  program.  Then,  for 
a  particular  notion  of  refinement  c,  we  will  want  to  prove  that  the 
typing  judgments  entail  the  contextual  refinement  property: 

Li  Hh  M  :  L2 
VP  .  (P  0  M)@Li  E  P@L2 

Informally,  if  M  faithfully  implements  L2  on  top  of  Li,  then 
invocations  in  P  of  a  primitive  i  with  behavior  cr  in  L2,  can  be 
satisfied  by  calling  the  corresponding  function  Kin  M. 

Indeed  in  Sec.  and  Sec.  the  primitive  specifications  in 
JM|L,  based  on  step  relations,  are  defined  to  reflect  the  possible 
executions  of  the  function  definitions  in  M.  Therefore,  L2 
Li  ®  |[M|Li  implies  that,  for  any  primitive  implementation  in  M, 
the  corresponding  deep  specification  in  L2  refines  the  execution  of 
that  function  definition.  Hence  the  execution  of  program  P  with 
underlay  L2  refines  that  of  P  ®  M  with  underlay  L\  (the  properties 
enumerated  in  Fig.j^hold  for  a  similar  reason).  Properties  of  the 
language  (i.e.,  being  determinate  and  receptive)  can  then  be  used  to 
reverse  this  refinement  into  the  desired  (P  ®  M)@Li  c  P@L2. 


3.3  Language  semantics 

Assume  that  layers  and  modules  are  interpreted  in  the  respective  sets 
L  and  M.  The  semantics  of  a  module  can  be  understood  as  the  effect 
of  its  code  has  on  the  underlay  interface,  as  specified  by  a  function 


4.  Layered  programming  in  ClightX 

In  this  section,  we  provide  an  instantiation  of  our  framework  for  a 
C-like  language.  This  instantiation  serves  two  purposes:  it  illustrates 
a  common  use  case  for  our  framework,  showing  its  usability  and 
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practicality;  and  it  shows  that  our  framework  can  add  modularization 
and  proof  infrastructure  to  existing  language  subsets  at  minimal  cost. 

Our  starting  point:  CompCert  Clight  Clight  (5)  is  a  subset  of 
C  and  is  formalized  in  Coq  as  part  of  the  CompCert  project.  Its 
formal  semantics  relies  on  a  memory  model  ED  that  is  not  only 
realistic  enough  to  specify  C  pointer  operations,  but  also  designed  to 
simplify  reasoning  about  non-aliasing  of  different  variables.  From 
the  programmer’s  point  of  view,  Clight  avoids  most  pitfalls  and 
peculiarities  of  C  such  as  nondeterminism  in  expressions  with  side 
effects.  On  the  other  hand,  Clight  allows  for  pointer  arithmetic  and 
is  a  true  subset  of  C.  Such  simplicity  and  practicality  turn  Clight 
into  a  solid  choice  for  certihed  programming.  However,  Clight 
provides  little  support  for  abstraction,  and  proving  properties  about 
a  Clight  program  requires  intricate  reasoning  about  data  structures. 
This  issue  is  addressed  by  our  layer  infrastructure. 

4.1  Abstract  state,  primitives,  and  layer  interfaces 

We  enable  abstraction  in  Clight  and  other  CompCert  languages  by 
instrumenting  the  memory  states  used  by  their  semantics  with  an  ab¬ 
stract  state  component.  This  abstract  state  can  be  manipulated  using 
primitives,  which  are  made  available  through  CompCert’ s  external 
function  mechanism.  We  call  the  resulting  language  ClightX. 

Abstract  state  and  external  functions  The  abstract  state  is  not 
just  a  ghost  state  for  reasoning:  it  does  influence  the  outcome 
of  executions!  However,  we  seek  to  minimize  its  impact  on  the 
existing  proof  infrastructure  for  program  and  compiler  verification. 
We  do  not  modify  the  semantics  of  the  basic  operations  of  Clight, 
or  the  type  of  values  it  uses.  Instead,  the  abstract  state  is  accessed 
exclusively  through  Clight’ s  external  function  mechanism. 

Primitives  and  layer  interfaces  CompCert  offers  a  notion  of  ex¬ 
ternal  functions,  which  are  useful  in  modeling  interaction  with  the 
environment,  such  as  input/output.  Indeed,  CompCert  models  com¬ 
piler  correctness  through  traces  of  events  which  can  be  generated 
only  by  external  functions.  CompCert  axiomatizes  the  behaviors 
of  external  functions  without  specifying  them,  and  only  assumes 
they  do  not  behave  in  a  manner  that  violates  compiler  correctness. 
We  use  the  external  function  mechanism  to  extend  Clight  with  our 
primitive  operations,  and  supply  their  specifications  to  make  the 
semantics  of  external  functions  more  precise. 

Definition  1  (Primitive  specification).  Let  mem  denote  the  type 
of  memory  state,  and  let  val  denote  the  type  of  concrete  values. 
A  primitive  specification  a  over  the  abstract  state  type  A  is  a 
predicate  on  {val*  x  mem  x  A)  x  (val  x  mem  x  A):  when 
a(args,m,a,  res,m',a')  holds,  we  say  that  the  primitive  takes 
arguments  args,  memory  state  m  and  abstract  state  a,  and  returns 
a  result  res,  a  memory  state  m'  and  an  abstract  state  a! . 

The  type  of  abstract  state  and  the  set  of  available  primitives  will 
constitute  our  notion  of  layer  interface. 

Definition  2  (Layer  interface).  A  layer  interface  L  is  a  tuple 
L  =  (A,  P)  where  A  is  the  type  of  abstract  state,  and  P  is  the  set  of 
primitives  as  a  finite  map  from  identifiers  to  primitive  specifications 
over  the  abstract  state  A. 

4.2  The  ClightX  parametric  language 

Syntax  The  syntax  of  ClightX  (parameterized  over  a  layer  inter¬ 
face  L)  is  identical  to  that  of  Clight.  It  features  global  variables 
(including  function  pointers),  stack-allocated  local  variables,  and 
temporary  variables  t .  Expressions  have  no  side  effects;  in  particu¬ 
lar,  they  cannot  contain  any  function  call.  They  include  full-fledged 
pointer  arithmetics  (comparison,  offset,  C-style  “arrays”). 

e  ::=  n\x\t  Constant,  variable,  temporary 

I  fee  |*e  |ei  op  62]  ■■■ 


Statements  include  assignment  to  a  memory  location  or  a  temporary, 
function  call  and  return,  and  structured  control  (loops,  etc.). 

S  ::=  61  =  62  Assignment  to  a  memory  location 

I  t  :=  e  Assignment  to  a  temporary  variable 

I  f  <— e(6i, . . . )  Function  call 
I  return(e)  Function  return 
I  Si',  S2  I  if(6)  Si  else  S2  I  while(6)  S 

Function  calls  may  refer  to  internal  functions  defined  as  part  of 
a  module,  or  to  primitives  defined  in  the  underlay  L.  However 
these  two  cases  are  not  distinguished  syntactically.  In  fact,  the  layer 
calculus  allows  for  replacing  primitive  specifications  with  actual 
code  implementation,  with  no  changes  to  the  caller’s  code. 

Definition  3  (Functions,  modules).  A  ClightX  function  is  a  tuple 
n  =  (targs,  Ivors,  S),  where  targs  is  the  list  of  temporaries  to 
receive  the  arguments,  Ivors  is  the  list  of  local  stack-allocated 
variables  with  their  sizes,  and  S  is  a  statement,  the  function  body.  A 
module  M  is  a  finite  map  from  identifiers  to  ClightX  functions. 

Semantics  Compared  with  Clight,  the  semantics  of  ChghtX(Z/) 
adds  a  notion  of  abstract  state,  and  permits  calls  to  the  primitives 
of  L.  We  will  write  L(i)(args,m,a,res,m' ,a)  to  denote  the 
semantics  of  the  primitive  associated  with  identifier  i  in  L. 

We  present  the  semantics  of  ClightX  under  the  form  of  a  big-step 
semantics.  We  fix  an  injective  mapping  F  from  global  variables  to 
memory  block  identifiers.  We  write  |[6|  (I,  r,  m)  for  the  evaluation 
of  expression  e  under  local  variables  I,  temporaries  r  and  memory 
state  m.  We  write  r,  L,  M,  Z  I—  S  :  (T,m,a)  [  (res',T',m',a') 
for  the  semantics  of  statements:  from  the  local  environment  I,  the 
temporary  environment  r,  the  memory  state  m,  and  the  abstract 
state  a,  execution  of  S  terminates  and  yields  result  res  (or  ■  if  no 
result),  temporary  environment  t' ,  memory  state  m' ,  and  abstract 
state  a' .  For  instance,  the  rule  for  return  statements  is: 

[e](Z,  T,  m)  =  res 

r,  L,  M,  I  I—  return(e)  :  (t,  m,  a)  f  (res;  t,  m,  a) 

We  write  r,L,M  I—  /  :  (orgs',m,a)  j)  (res;  m',  a')  to  say 
that  a  function  /  defined  either  as  an  internal  function  in  the  module 
M,  or  as  a  primitive  in  the  layer  interface  L,  called  with  list  of 
arguments  args,  from  memory  state  m  and  abstract  state  a,  returns 
result  res,  memory  m'  and  abstract  state  a' . 

For  internal  function  calls,  we  first  initialize  the  temporary 
environment  with  the  arguments,  and  allocate  the  local  variables  of 
the  callee  (next(m)  denotes  the  next  available  block  identifier  in 
memory  m,  not  yet  allocated).  Then,  we  execute  the  body.  Finally, 
we  deallocate  the  stack-allocated  variables  of  the  callee. 

^if)  =  ((H,  ■  ■  -  An),  ((3:1,  s^l),  .  .  .  ,  (Xf,,  SZk)),  S) 
mi  =  alloc(szfc)  o  •  •  •  o  alloc(s2i)(m) 

I  =  0[xi  <—  next(m)]  . . .  [xk  <—  next(m)  -(-  fc  —  1] 
r  =  0[ii  <-  ui]  . . .  [t„  <-  v„] 
r,  L,  M,  I  S  :  (t,  mi ,  a)  j  (res;  A,  m2,  o') 
m'  =  free(next(m),  szi)  o  •  •  •  o  free(next(m)  -I-  A:  —  1,  sz0(m2) 
r,  L,  M  \-  f  :  (vi, ..  .,Vn',  m,a)  JJ.  {res-,m' ,  a') 

For  primitive  calls,  we  simply  query  the  layer  interface  L: 

L(f)(args,  m,  a,  res,  m! ,  a!') 
r,  L,  M  [—  /  :  {cLTgs\  m,  a)  H  (res;  m' ,  a') 

Using  the  function  judgment,  we  can  state  the  rule  for  function  call 
statements  as: 

V2,  lei}{l,T,m)  -  Vi  le}{l,T,m)  -  (6,0) 
r(/)  ^  ^  r,  L,  M  |—  /  :  (ri, . . . ,  Vn;  m,  a)  Jj.  (res;  m' ,a^) 
r'  =  T[t  <—  res] 

r,  L,  M,  I  \—  t  <—  e(ei , . . . ,  Cn)  :  (r,  m,  a)  J,  (■;  m^,  a^) 
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Vi  .  CTi  cr' 

Vi .  Li  hid  i  Ki  :  i  ct'  yi  ^  a^i  i  ^  a'. 

Li  hid  '■  L'^  L2  !£ij  L[ 

Li  hi?  M  :  L2 

where  Li  is  the  underlay,  the  module  M  =  0^  i  1— >  Ki,  the  intermediate 
layer  Lj  =  0^  i  1— >  and  the  overlay  L2  =  0^  Ui. 

Figure  6.  Building  a  certified  ClightX  layer 


Figure  7.  Layer  simulation  relation 


The  full  semantics  of  ClightX  is  given  in  the  companion  TR  1131. 

Deflnition  4  (Semantics  of  a  module).  Let  M  be  a  ClightX  module, 
and  L  be  a  layer  interface.  Let  T  be  a  mapping  from  global  variables 
to  memory  blocks.  The  semantics  of  a  module  M  in  ClightX(L), 
written  |[AT|L,  is  the  layer  interface  defined  as  follows: 

•  the  type  of  abstract  state  is  the  same  as  in  L; 

•  the  semantics  of  primitives  are  defined  by  the  following  rule: 

f  s  dom(M)  r,L,  M  h  /  :  (a.rgs\  m,  a)  (1  (res;  m' ,  a') 
{lM}L){f){args,  m,  a,  res,  m' ,  a') 

4.3  Layered  programming  and  verification 

To  construct  a  certified  abstraction  layer  (Li,  M,  L2),  we  need  to 
find  a  simulation  R  such  that  Li  hi?  M  :  L2  holds.  Fig.|^gives 
an  overview  of  this  process.  We  write  M  =  0^  i  1— ►  where 

i  ranges  over  the  function  identifiers  defined  in  module  M,  and 
Ki  is  the  corresponding  implementation.  Global  variables  in  M 
should  not  be  accessible  from  the  layers  above:  their  permissions  are 
removed  in  the  overlay  interface  L2.  The  interface  L2  also  includes 
a  specification  ct?  for  each  function  i  defined  in  M. 

We  decouple  the  task  of  code  verification  from  that  of  data 
structure  abstraction.  We  introduce  an  intermediate  layer  interface, 
L'l  =  0j  i  I— ►  o-'i,  with  its  specifications  cr'  expressed  in  terms 
of  the  underlay  states.  We  first  prove  that  Li  hw  M  :  L'l 
holds.  For  each  function  i  in  M,  we  show  that  its  implementation 
Ki  is  a  downward  simulation  of  its  “underlay”  specification  cr', 
that  is,  Li  hid  i  r-i.  Ki  :  i  i—i.  a'i.  We  apply  the  Hcomp  rule 
to  compose  all  the  per-function  simulation  statements.  Note  the 
simulation  relations  here  are  all  id,  meaning  there  is  no  abstraction 
of  data  structures  in  these  steps.  We  then  prove  L2  L'l,  which 
means  that  each  specification  cr?  in  L2  is  an  abstraction  of  the 
intermediate  specification  cr'  via  a  simulation  relation  R.  From 
i  1— >  (ji  ^i?  i  H- >  a'i,  we  apply  the  monotonicity  rule  LLe-Mon 
to  get  L2  Ci?  L'l.  Finally,  we  apply  the  Conseq  rule  to  deduce 
Li  hi?  M  :  L2. 


typedef  enum  { 

PG_RESERVED ,  PG_KERNEL , 
PG.NORMAL 
}  pg-type; 

Struct  page_info  { 
pg_type  t; 
uint  u; 

>; 

struct  page_info  AT[1«20]  ; 


Notation  RESV  :=  0. 

Notation  KERN  :=  (RESV  +  1). 
Notation  NORM  :=  (KERN  +  1). 

Inductive  page_info  := 

I  ATV  (t:  Z)  (u:  Z) 

I  ATUndef. 

Record  abs ’ '  : = 

{AT:  ZMap.t  page_info}-. 


Figure  8.  Concrete  (C)  vs.  abstract  (Coq)  memory  allocation  table 

/  /  ^at_get 

Function  aat.get  a  i  :  = 

uint  at_get  (uint  i){ 

match  (a. AT  i)  with 

uint  allocated; 

1  ATV  _  0  =>  Some  0 

allocated  =  AT[i].u; 

1  ATV  _  _  =>  Some  1 

if  (allocated  !=  0) 

1  _  =>  None 

allocated  =  1; 

end. 

return  allocated; 

} 

Function  d’at.set  a  i  b  :  = 

match  (a. AT  i)  with 

/  /  ^at_set 

1  ATV  t  _  => 

void  at_set  (uint  i,  uint  b){ 

Some  (set_AT  a  i  (ATV  t  b)) 

AT[i]  .u  =  b; 

1  _  =>  None 

} 

end. 

Figure  9.  Concrete  vs.  abstract  getter-setter  functions  for  AT 

Inductive  :  = 

Inductive  crat.set  :  = 

1  V  m  m’  a  ofs  v  n, 

1  V  m  a  a’  n  V, 

m. store  AT  ofs  v  =  m’ 

set  a  V  =  Some  a’ 

->  ofs  =  n  *  8  +  4 

->  0  <=  n  <  1048576 

->  0  <=  n  <  1048576 

->  CTatjet  (n::v::nll) 

->  :nil) 

m  a  Vundef  m  a’ . 

m  a  Vundef  m’  a. 

Figure  10.  High  level  and  low  level  specification  for  at_set 


on  the  fly.  The  automated  theorem  prover  is  a  first  order  prover, 
extended  with  different  theory  solvers,  such  as  the  theory  of  integer 
arithmetic  and  the  theory  of  CompCert  style  partial  maps.  The  entire 
automation  engine  is  developed  in  Coq’s  Ltac  language. 

Data  abstraction  Since  primitives  in  L'l  and  L2  are  atomic,  we 
prove  the  single-step  downward  simulation  between  Li  and  L2  only 
at  the  specification  level.  The  simulation  proof  for  the  abstraction 
can  be  made  language  independent.  The  simulation  relation  R 
captures  the  relation  between  the  underlay  state  (concrete  memory 
and  abstract  state)  and  the  overlay  state,  and  can  be  decomposed 
as  Rman  and  iiabs  (see  Fig.|^.  The  relation  Rmsm  ensures  that  the 
concrete  memory  states  mi  and  m2  contain  the  same  values,  while 
making  sure  the  memory  permissions  for  the  part  to  be  abstracted 
are  erased  in  the  overlay  memory  m2 .  The  component  R^bs  relates 
the  overlay  abstract  state  02  with  the  full  underlay  state  (mi,  oi). 

Through  this  decomposition,  we  achieve  the  following  two 
objectives:  the  client  program  can  directly  manipulate  the  abstract 
state  without  worrying  about  its  underlying  concrete  implementation 
(which  is  hidden  via  Rmem),  and  the  abstract  state  in  the  overlay  is 
actually  implementable  by  the  concrete  memory  and  abstract  state 
in  the  underlay  (via  Ruts). 


Verifying  ClightX  functions  Li  and  L'l  share  the  same  views  of 
both  concrete  and  abstract  states,  so  no  simulation  relation  is  in¬ 
volved  during  this  step  of  verification  (the  Fun  rule  in  Sec.  |3.3|l. 
Using  Coq’s  tactical  language,  we  have  developed  a  proof  automa¬ 
tion  engine  that  can  handle  most  of  the  functional  correctness  proofs 
of  ClightX  programs.  It  contains  two  main  parts:  a  ClightX  state¬ 
ment/expression  interpreter  that  generates  the  verification  conditions 
by  utilizing  rules  of  ClightX  big-step  semantics,  and  an  automated 
theorem  prover  that  discharges  the  generated  verification  conditions 


Common  patterns  We  have  developed  two  common  design  pat¬ 
terns  to  further  ease  the  task  of  verification.  The  getter-setter  pattern 
establishes  memory  abstraction  by  introducing  new  abstract  states 
and  erasing  the  corresponding  memory  permissions  for  the  overlay. 
The  overlay  only  adds  the  get  and  set  primitives  which  are  imple¬ 
mented  using  simple  memory  load/store  operations  at  the  underlay. 
The  abs-fun  pattern  implements  key  functionalities,  but  does  not 
introduce  new  abstract  state.  Its  implementation  (on  underlay)  does 
not  touch  concrete  memory  state.  Instead,  it  only  accesses  the  states 


/ /  /^palloc 

uint  palloc(uint  nps)-C 
uint  i  =  0,  u; 
uint  freei  =  nps; 
while (freei  ==  nps 

&&  i  <  nps)  { 
u  =  at_get (i) ; 
if  (u  ==  0) 
freei  =  i; 
i  ++; 

} 

if  (freei  !=  nps) 
at_set (freei ,  1); 
return  freei; 

} 


Figure  11.  Concrete  (in  C)  vs.  abstract  (in  Coq)  palloc  function 

Inductive  cr'^n^  :  spec  :  = 

I  V  m  a  nps  n, 

•tpalloc  a  nps  =  (a’ ,  n) 

->  0  <=  nps  <  1048576 

->  o-paiioc  (nps;  mil)  m  a  n  m  a’ . 

Definition  o-paii„c  :=  o-'^npc- 


Figure  12.  High  level  and  low  level  specification  for  palloc  function 

that  have  already  been  abstracted,  and  it  only  does  so  using  the 
primitives  provided  by  the  underlay  interface. 

Figs.  |8|12|  show  how  we  use  the  two  patterns  to  implement 
and  verify  a  simplified  physical  memory  allocator  palloc,  which 
allocates  and  returns  the  first  free  entry  in  the  physical  memory 
allocation  table.  Fig.  [8p0|  shows  how  we  follow  the  getter-setter 
pattern  to  abstract  the  allocation  table  into  a  new  abstract  state.  As 
shown  in  Fig.|^  we  first  turn  the  concrete  C  memory  allocation  table 
implementation  into  an  abstract  Coq  data  type.  Then  we  implement 
the  getter  and  setter  functions  for  the  memory  allocation  table,  both 
in  C  and  Coq  (see  Fig.l^.  The  Coq  functions  (T3t_get  and  (Tat_set  are 
just  intermediate  specifications  that  are  used  later  in  the  overlay 
specifications.  The  actual  underlay  and  overlay  specifications  of  the 
setter  function  at_set  are  shown  in  Fig.|10| 

We  then  prove  L\  I— w  at_set  Kat_set  :  at_set  cr't_set.  and 
also  at_Set  (Tat_set  at_Set  >— >  (7at_sef 

The  code  verification  (first  part)  is  easy  for  this  pattern  because 
the  memory  load  and  store  operations  in  the  underlay  match  the 
source  code  closely.  The  proof  can  be  discharged  by  our  automation 
tactic.  The  main  task  of  this  pattern  is  to  prove  refinement  (second 
part):  we  design  a  simulation  relation  R  relating  the  memory  storing 
the  global  variable  at  underlay  with  its  corresponding  abstract  data 
at  overlay.  The  component  Rman  ensures  that  there  is  no  permission 
for  allocation  table  AT  in  overlay  memory  state  m2,  while  the 
component  R^bs  is  defined  as  follows: 

•  Vi  e  [0,  2^°),  Rabs  enforces  the  writable  permission  on  AT  [i] 
at  underlay  memory  state  mi,  and  requires  (02. AT  i)  at  overlay 
to  be  (ATV  AT[i]  .t  AT[i]  .u). 

•  Except  for  AT,  Rjbs  requires  all  other  abstract  data  in  underlay 
and  overlay  to  be  the  same. 

The  refinement  proof  for  1/2  Ti  involves  the  efforts  to  prove 
that  this  relation  R  between  underlay  memory  and  overlay  abstract 
state  is  preserved  by  all  the  atomic  primitives  in  both  L'l  and  L2. 

After  we  abstract  the  memory  and  get/set  operations,  we  im¬ 
plement  palloc  on  top  of  L2,  following  the  abs-fun  pattern.  The 
previous  overlay  now  becomes  the  new  underlay  (“Li”).  Fig.  [m 
shows  both  the  implementation  of  palloc  in  ClightX  and  the  ab¬ 
stract  function  in  Coq.  As  before,  we  separately  show  that  L\  I— w 
palloc  Kpaiioc  :  palloc  and  palloc  ^  (Tpaiioc 


palloc  (Tpjiiop  holds.  For  the  abs-fun  pattern,  the  refinement  proof 
is  easy.  Since  we  do  not  introduce  any  new  abstract  states  in  this 
pattern,  the  implementation  only  manipulates  the  abstract  states 
through  the  primitive  calls  of  the  underlay.  Thus,  as  shown  in  Fig. 
[T^  the  corresponding  underlay  and  overlay  specifications  are  exactly 
the  same,  so  the  relation  R  here  is  the  identity  (id)  and  the  proof 
of  refinement  is  trivial.  The  main  task  for  the  abs-fun  pattern  is  to 
verify  the  code,  which  is  done  using  our  automation  tactic. 

The  above  examples  show  that  for  the  getter-setter  pattern,  the 
primary  task  is  to  prove  data  abstraction,  while  for  the  abs-fun 
pattern,  the  main  task  is  to  do  simple  program  verification.  These 
two  tasks  are  well  understood  and  manageable,  so  the  decoupling 
(via  these  two  patterns)  makes  the  layer  construction  much  easier. 

5.  Layered  programming  in  LAsm 

In  this  section,  we  describe  LAsm,  the  Layered  Assembly  language, 
and  the  extended  machine  model  which  LAsm  is  based  on. 

The  reason  we  are  interested  in  assembly  code  and  behavior  is 
threefold.  First  of  all,  even  though  we  provide  ClightX  to  write  most 
code,  we  are  still  interested  in  the  actual  assembly  code  running  on 
the  actual  machine.  In  Section|^  we  will  provide  a  verified  compiler 
to  transport  all  proofs  of  code  written  in  ClightX  to  assembly. 

Secondly,  there  are  parts  of  software  that  have  to  be  manually 
written  in  assembly  for  various  reasons.  For  example,  the  standard 
implementation  of  kernel  context  switch  modifies  the  stack  pointer 
register  ESP,  which  does  not  satisfy  the  C  calling  convention  and 
has  to  be  verified  in  assembly.  A  linker  will  be  defined  in  Section|^ 
to  link  them  with  compiled  C  code. 

Last  but  not  least,  we  are  interested  not  only  in  the  behavior 
of  our  code,  but  also  in  the  behavior  of  the  context  that  will  call 
functions  defined  in  our  code.  To  be  as  general  as  possible,  we  allow 
the  context  to  include  all  valid  assembly  code  sequences.  To  this 
end,  it  is  necessary  to  transport  per-function  refinement  proofs  to  a 
whole-machine  contextual  refinement  proof. 

The  LAsm  assembly  language  We  start  from  the  32-bit  x86 
assembly  subset  specified  in  CompCert.  CompCert  x86  assembly  is 
modeled  as  a  state  machine  with  a  register  set  and  a  memory  state. 
The  register  set  consists  of  eight  32-bit  general-purpose  registers  and 
eight  XMM  registers  designated  as  scalar  double-precision  floating¬ 
point  operands.  The  memory  state  is  same  as  the  one  in  Clight.  In 
particular,  each  function  executes  with  its  stack  frame  modeled  in 
its  own  memory  block,  so  that  the  stack  is  not  a  contiguous  piece 
of  memory.  Another  anomaly  regarding  function  calls  in  CompCert 
x86  assembly  is  that  the  return  address  is  stored  in  pseudo-register 
RA  instead  of  being  pushed  onto  the  stack,  so  that  the  callee  must 
allocate  its  own  stack  frame  and  store  the  return  address. 

Similarly  to  ClightX,  we  extend  the  machine  state  with  an 
abstract  state,  which  will  be  modified  by  primitives.  This  yields 
LAsm,  whose  syntax  is  the  same  as  that  of  CompCert  x86  assembly, 
except  that  the  semantics  will  be  parameterized  over  the  type  of 
abstract  states  and  the  specifications  of  primitives.  Most  notably, 
primitive  calls  are  syntactically  indistinguishable  from  normal 
function  calls,  yet  depend  on  the  specifications  semantically. 

Moreover,  in  our  Coq  formalization,  the  semantics  of  LAsm 
is  also  equipped  with  memory  accessors  for  address  translation  in 
order  to  handle  both  kernel  memory  linear  mapping  and  user  space 
virtual  memory.  However,  for  the  sake  of  presentation,  we  are  going 
to  describe  a  simplified  version  of  LAsm  where  memory  accesses 
only  use  the  kernel  memory. 

We  define  the  semantics  of  LAsm  in  small-step  form.  The 
machine  state  is  (p,  m,  a)  where  p  contains  the  values  of  registers, 
m  is  the  concrete  memory  state  and  a  is  the  abstract  state.  Let  M  be 
an  LAsm  module,  which  is  a  finite  map  from  identifiers  to  arrays  of 
LAsm  instructions,  we  write  T,L,M  \-  (p,  m,  a)  (p',  m',  a') 


Definition  first_frGe  a  n: 

{v|  0<=  fst  V  <  n 
/\  a. AT  (fst  v)  =  ATV  (snd  v)  0 
/\  V  X,  0  <=  X  <  fst  V 

->  -  a. AT  X  =  ATV  _  0} 

+  {V  X,  0  <=  X  <  n 

->  -  a. AT  X  =  ATV  _  0}. 

Function  (Tpaiioc  ^  = 

match  first.free  a  nps  with 
I  inleft  (exist  (i,  t)  _)  => 
(set_AT  a  i  (ATV  t  1) ,  i) 

I  _  =>  (a,  nps) 
end. 
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a  transition  step  in  the  LAsm  machine.  The  full  syntax  and  formal 
semantics  of  LAsm  is  described  in  the  companion  technical  report. 

Assembly  layer  interfaces  The  semantics  of  LAsm  is  parameter¬ 
ized  over  a  layer  interface.  Different  from  C-style  primitives  (see 
Def.  0,  which  are  defined  using  argument  list  and  return  value, 
primitives  implemented  in  LAsm  often  utilize  their  full  control  over 
the  register  set  and  are  not  restricted  to  a  particular  calling  con¬ 
vention  (e.g.  context  switch).  Therefore,  it  is  necessary  to  extend 
the  structure  of  layer  interfaces  to  allow  assembly-style  primitives 
modifying  the  register  set. 

Definition  5  (Assembly-style  primitive).  An  assembly-style  prim¬ 
itive  specification  p  over  the  abstract  slate  type  A  is  a  predicate 
on  {{preg  vat)  x  mem  x  A)  x  {{preg  —*  vat)  x  mem  x  A). 
p{p,  m,  a,  p' ,  m' ,  a)  says  that  the  primitive  p  takes  register  set  p, 
memory  state  m  and  abstract  state  a  as  arguments,  and  returns 
register  set  p  ,  memory  state  m  and  abstract  state  a  as  result. 

By  ‘"style,”  we  mean  the  calling  convention,  not  the  language  in 
which  they  are  actually  implemented.  C-style  primitives  may  very 
well  be  implemented  as  hand-written  assembly  code  at  underlay. 

We  can  then  define  assembly  layer  interfaces  by  replacing  the 
primitive  specification  with  our  assembly-style  one  in  Def.|^  But, 
to  make  reasoning  simpler,  when  defining  assembly  layer  interfaces, 
we  distinguish  C-style  from  assembly-style  primitives.  First,  C-style 
primitives  can  be  refined  by  other  C-style  primitives.  Second  and 
most  importantly,  it  becomes  possible  to  instantiate  the  semantics  of 
ClightX  with  an  assembly  layer  interface  by  just  considering  C-style 
primitives  and  ignoring  assembly-style  primitives  (which  might  not 
follow  the  C  calling  convention).  In  this  way,  ClightX  code  is  only 
allowed  to  call  C-style  primitives,  whereas  LAsm  can  actually  call 
both  kinds  of  primitives. 

Definition  6  (Assembly  layer  interface).  An  assembly  layer  inter¬ 
face  L  is  a  tuple  L  =  (A,  Pcughtx,  PLAsm)  where: 

•  (A,  Pciightx)  is  a  C  layer  interface  (see  Def.^ 

•  PuAsm  is  a  finite  map  from  identifiers  to  assembly-style  primitive 
specifications  over  the  abstract  state  A.  The  domains  o/Pcughtx 
and  PLAsm  shall  be  disjoint. 

Whole-machine  semantics  and  contextual  refinement  Based  on 
the  relational  transition  system  which  we  just  defined  for  LAsm, 
we  can  define  the  whole-machine  semantics  including  not  only  the 
code  that  we  wrote  by  hand  or  that  we  compile,  but  also  the  context 
code  that  shall  call  our  functions.  To  this  end,  it  suffices  to  equip  the 
semantics  with  a  notion  of  initial  and  final  state,  in  a  way  similar  to 
the  CompCert  x86  whole-program  assembly  semantics. 

In  CompCert,  the  initial  state  consists  of  an  empty  register  set 
with  only  EIP  (instruction  pointer  register)  pointing  to  the  main 
function  of  the  module,  and  the  memory  state  is  constructed  by 
allocating  a  memory  block  for  each  global  variable  of  the  program. 
We  follow  the  same  approach  for  LAsm,  except  that  we  also  need 
an  initial  abstract  state,  provided  by  the  layer  interface,  so  we  need 
to  extend  its  definition: 

Definition  7  (Whole-machine  layer  interface).  A  whole-machine 
layer  interface  L  is  a  tuple  L  =  (A,  Pcughtx,  PLAsm,  do)  where: 

•  (A,  Pciightx,  pLAsm)  w  cin  assembly  layer  interface 

•  ao  :  A  is  the  initial  abstract  state. 

Definition  8  (Whole-machine  initial  state).  The  whole-machine 
LAsm  initial  state  for  layer  interface  L  and  module  M  is  the  LAsm 
state  (po,  mo,  flo)  defined  as  follows: 

f  (r(main),0)  if r  =  E\P 

•  po(r)  =  ]  0  if  r  =  RA 

_L  otherwise 


•  mo  is  constructed  from  the  global  variables  ofT,  L,  M 

•  ao  is  the  whole-machine  initial  state  specified  in  L 

Definition  9  (Whole-machine  final  state).  A  whole-machine  LAsm 
state  (p,  m,  a)  is  final  with  return  code  n  if,  and  only  if,  p(EAX)  = 
n  and  p(EIP)  =  0,  where  EAX  is  the  accumulator  register. 

Notice  that  p(EIP)  contains  the  integer  0,  which  is  also  the  initial 
return  address  and  is  not  a  valid  pointer.  This  ensures  that  executions 
do  not  go  beyond  a  final  state,  following  the  CompCert  x86  whole- 
program  semantics:  main  has  returned  to  its  “caller”,  which  does 
not  exist.  Thus,  the  final  state  is  uniquely  determined  (there  can 
be  no  other  possible  behavior  once  such  a  state  is  reached),  so  the 
whole-machine  semantics  is  deterministic  once  the  primitives  are. 

Definition  10  (Whole  -machine  behavior).  Let  V  be  a  mapping  of 
global  variables  to  memory  blocks.  Then,  we  say  that 

•  LAsm{r,  L,  M)  diverges  if  there  is  an  infinite  execution  se¬ 
quence  from  the  whole-machine  initial  state  for  L 

•  LAsm{T,  L,  M)  terminates  with  return  code  n  if  there  is  a  finite 
execution  sequence  from  the  whole-machine  initial  state  for  L 
to  a  whole-machine  final  state  with  return  code  n 

•  LAsm{T ,  L,  M)  goes  wrong  if  there  is  a  finite  execution  se¬ 
quence  from  the  whole-machine  initial  state  for  L  to  a  non-final 
state  that  can  take  no  step. 

Then,  we  are  interested  in  refinement  between  whole  machines: 

Definition  11  (Whole-machine  refinement).  Let  Lkigh,Liow  be 
two  whole-machine  assembly  layer  interfaces,  and  Mio„ 

be  two  LAsm  modules.  Then,  we  say  that  Mi„„@Liow  refines 
ALhigh^TtLfijgh,  and  write  ALkigh^^Lhigh  if,  and  only 

if,  for  any  T  such  that  Aom.(Lhigh)  u  dom(M/,ig/,)  u  dom(Li„„.)  u 
dom(Mtow)  c  dom(r)  and  LAsm{r,  Lhigh,  Mhigh)  does  not  go 
wrong,  then  (L)  LAsmiV ,  Liow,  Miow)  does  not  go  wrong;  (2)  if 
LAs‘m(r,  Liow,  Miow)  terminates  with  return  code  n,  then  so  does 
LAsm(r,  Lhigh,  Mhigh);  (3)  if  LAsm(r,  Liow,  Mhw)  diverges,  so 
does  LAs7ti(T ,  Lhigh,  ALhigh'). 

In  our  Coq  implementation,  we  actually  formalized  the  semantics 
of  LAsm  with  a  richer  notion  of  observable  behaviors  involving 
CompCert-style  events  such  as  I/O.  Thus,  we  define  the  whole- 
machine  behaviors  and  refinement  using  event  traces  a  la  CompCert 
1201  3.5  sqq.]:  if  the  higher  machine  does  not  go  wrong,  then  every 
valid  behavior  of  the  lower  machine  is  a  valid  behavior  of  the  higher. 

Finally,  we  can  define  contextual  refinement  between  layer 
interfaces  through  a  module  M : 

Definition  12  (Contextual  refinement).  We  say  a  module  M  im¬ 
plements  an  overlay  Lhigh  on  top  of  an  underlay  Liow,  and  write 
Liow  \=  M  :  Lhigh  if,  and  only  if,  for  any  module  (context)  M'  dis¬ 
joint  from  M,  Liow,  Lhigh,  we  have  (M  0  M')@Liow  E  M' ©Lhigh. 

Per-module  semantics  As  for  ClightX,  we  can  also  specify  the 
semantics  of  an  LAsm  module  as  a  layer  interface.  However,  a  major 
difference  between  ClightX  and  LAsm  is  that  it  is  not  possible  to 
uniquely  characterize  the  “per-function  final  state”  at  which  function 
execution  should  stop.  Indeed,  as  in  LAsm  there  is  no  control  stack, 
when  considering  the  per-function  semantics  of  a  function  /,  it  is 
not  possible  to  distinguish  /  exiting  and  returning  control  to  its 
caller,  from  a  callee  g  returning  to  /. 

Thus,  even  though  both  the  step  relation  of  the  LAsm  semantics 
and  the  primitive  specifications  (of  a  layer  interface)  are  determinis¬ 
tic,  the  semantics  of  a  function  could  still  be  non-deterministic. 

Definition  13.  Let  L  =  (A,  _)  be  an  assembly  layer  interface, 

and  M  be  an  LAsm  module.  The  module  semantics  JM]  L  is  then  the 
assembly  layer  interface  |[M|L  =  (A,  0,  P),  where  the  assembly- 
style  primitive  specification  P  is  defined  for  each  f  e  dom(M) 
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using  the  small-step  semantics  ofLAsm  as  follows: 

^  r(/)  =  &Ap(EIP)  =  (6,0) 

Ar,L,M  I—  {p,m,a)  {p',m',a') 

Soundness  of  per-module  refinement  In  this  paper,  we  aim  at 
showing  that  the  layer  calculus  given  in  Section  is  a  powerful 
device  to  prove  contextual  refinement:  instead  of  proving  the  whole- 
machine  contextual  refinement  directly,  we  only  need  to  prove  the 
downward  simulation  relations  about  individual  modules,  notated 
as  Z/iow  \-R  M  :  Lhigh,  and  apply  the  soundness  theorem  to  get  the 
contextual  refinement  properties  at  the  whole-machine  level. 

Lemma  1  (Downward  simulation  diagram).  Let  M,  Lkigh) 
be  a  certified  layer,  such  that  Liow  \-r  M  :  Lhigh-  Then,  for  any 
module  M' ,  we  have  the  following  downward  simulation  diagram: 


^high 


r , Lhigh, m' 


^high 


Stow  5/^3, 


Theorem  1  (Soundness).  Let  {Liow,  M,  Lhigh)  be  a  certified  layer. 
If  the  primitive  specifications  of  Liow  ore  deterministic  and  if 
Liow  I  R  LI  .  Lhigh,  then  Liow  1^  LI  .  Lhigh- 


Proof.  Since  the  whole  machine  LAsm{T,  Liow,  LI)  is  deterministic, 
we  can  flip  the  downward  simulation  given  by  Lemma  [T]  to  an 
upward  one,  hence  the  whole-machine  refinement.  □ 


Since  the  per-function  semantics  is  non-deterministic  due  to  its 
final  state  not  being  uniquely  defined,  we  can  only  flip  the  downward 
simulation  to  contextual  refinement  at  the  whole-machine  level. 


6.  Certified  compilation  and  linking 

We  would  like  to  write  most  parts  of  our  kernel  in  ClightX  rather 
than  in  LAsm  for  easier  verification.  This  means  that,  for  each  layer 
interface  L,  we  have  to  compile  our  ClightX(L)  source  code  to  the 
corresponding  LAsm(L)  assembly  language  in  such  a  way  that  all 
proofs  at  the  ClightX  level  are  preserved  at  the  LAsm  level. 

This  section  describes  how  we  have  modified  the  CompCert 
compiler  to  compile  certified  C  layers  into  certified  assembly  layers. 
It  also  talks  about  how  we  link  compiled  certified  C  layers  with 
other  certified  assembly  layers. 

6.1  The  CompCertX  verified  compiler 

To  transport  the  proofs  at  ClightX  down  to  LAsm,  we  adapt  the 
CompCert  verified  compiler  to  parameterize  all  its  intermediate 
languages  over  the  layer  interface  L  similarly  to  how  we  defined 
ClightX(L),  including  the  assembly  language.  This  gives  rise  to 
CompCertX(L)  (for  “CompCert  extended”,  where  external  func¬ 
tions  are  instantiated  with  layer  interface  L). 

CompCertX  goes  from  ClightX  to  the  similarly  parameterized 
AsmX  and  then  to  LAsm.  We  retain  all  features  and  optimizations 
of  CompCert,  including  function  inlining,  dead  code  elimination, 
common  subexpression  elimination,  and  tail  call  recognition. 
Compiler  correctness  for  CompCertX  Because  CompCert  only 
proves  semantics  preservation  for  whole  programs,  the  major  chal¬ 
lenge  is  to  adapt  the  semantics  preservation  statements  of  all  compi¬ 
lation  passes  (from  Clight  to  assembly)  to  per-function  semantics. 

The  operational  semantics  of  all  CompCert  languages  are  given 
through  small-step  transition  relations  equipped  with  sets  of  whole- 
program  initial  and  final  states,  so  we  have  to  redesign  those  states 
to  per-function  setting.  For  the  initial  state,  whereas  CompCert 


constructs  an  initial  memory  and  calls  main  with  no  arguments,  we 
take  the  function  pointer  to  call,  the  initial  memory,  and  the  list  of 
arguments  as  parameters.  For  the  final  state,  we  take  not  only  the 
return  value,  but  also  the  memory  state  when  we  exit  the  function. 

Consequently,  the  compiler  correctness  proofs  have  to  change. 
Currently,  CompCert  uses  a  downward  simulation  diagram  1201  2.1] 
for  each  pass  from  Clight,  then,  thanks  to  the  fact  that  the  CompCert 
assembly  language  is  deterministic  (up  to  input  values  given  by 
the  environment),  CompCert  composes  all  of  them  together  before 
turning  them  to  a  single  upward  simulation  which  actually  entails 
that  the  compiled  code  refines  the  source  code. 

In  this  work,  we  follow  a  similar  approach:  for  each  individual 
pass,  we  prove  per-function  semantics  preservation  in  a  downward 
simulation  flavor.  We  do  not,  however,  turn  it  into  an  upward 
simulation,  because  the  whole  layer  refinement  proof  is  based 
on  downward  simulation,  which  is  in  turn  turned  into  an  upward 
simulation  at  whole-machine  contextual  refinement  thanks  to  the 
determinism  (up  to  the  environment)  of  LAsm(L). 

Memory  state  during  compilation  The  main  difference  between 
CompCert  and  CompCertX  lies  in  the  memory  given  at  the  begin¬ 
ning  of  a  function  call. 

In  the  whole-program  setting,  the  initial  state  is  the  same  across 
all  languages,  because  it  is  uniquely  determined  by  the  global 
variables  (which  are  preserved  by  compilation).  On  the  other  hand,  in 
the  middle  of  the  execution  when  entering  an  arbitrary  function,  the 
memory  in  Clight  is  different  from  its  assembly  counterpart  because 
CompCert  introduces  memory  transformations  such  as  memory 
injections  or  extensions  mis  .4]  to  manage  the  callees’  stack  frames. 
This  is  actually  advantageous  for  compilation  of  handling  arguments 
and  the  return  address. 

For  CompCertX,  within  the  module  being  compiled,  the  same 
memory  state  mismatch  also  exists.  At  module  entry,  however,  we 
cannot  assume  much  about  the  memory  state  because  it  is  given  as 
a  parameter  to  the  semantics  of  each  function  in  the  module.  In  fact, 
this  memory  state  is  determined  by  the  caller,  so  it  may  very  well 
come  from  non-ClightX  code  (e.g.,  arbitrary  assembly  user  code), 
thus  we  have  to  take  the  same  memory  as  initial  state  across  all  the 
languages  of  CompCertX.  It  follows  then  that  the  arguments  of  the 
function  already  have  to  be  present  in  the  memory,  following  the 
calling  convention  imposed  by  the  assembly  language,  even  though 
ClightX  does  not  read  the  arguments  from  memory. 

Another  difference  between  CompCert  and  CompCertX  is  the 
treatment  of  final  memory  states.  In  CompCert,  only  the  return 
value  of  a  program  is  observable  at  the  end;  the  final  memory  state 
is  not.  By  contrast,  in  CompCertX,  the  final  memory  state  is  passed 
back  to  the  caller  hence  observable.  Thus,  it  is  necessary  to  account 
for  memory  transformations  when  relating  the  final  states  in  the 
simulation  diagrams. 

Compilation  refinement  relation  Finally,  the  per-function  com¬ 
piler  correctness  statement  of  CompCertX  can  be  roughly  summa¬ 
rized  as  this  commutative  diagram  and  formally  defined  below. 
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Definition  14.  Let  Lc  be  a  C  layer  interface,  and  La.™  be  an 
assembly  layer  interface.  We  say  that  Lc  is  simulated  by  Lasoi 
by  compilation,  written  Lc  La.™,  if  and  only  if,  for  any  F,  and 

for  any  execution  Lc  (/)  ((,  m,  a,  v,  m  ,  a  )  of  a  primitive  f  of  Lc 
for  some  list  I  of  arguments  and  some  return  value  v,  from  memory 
state  m  and  abstract  state  a  to  m  and  a  ,  and  for  any  register  map 
p  such  that  the  following  requirements  hold: 
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].  the  memory  m  contains  the  arguments  I  in  the  stack  pointed  to 
byp{E.SP) 

2.  EIP  points  to  the  function  f  being  called:  p(EIP)  =  (r(/),  0) 

Then,  there  is  a  primitive  execution  LAsmif)ip,rn,  a,  p' ,m" ,a') 
and  a  memory  injection  j  from  m!  to  m"  preserving  the  addresses 
of  m  such  that  the  following  holds: 

•  the  values  of  callee-save  registers  in  p  are  preserved  in  p' ; 

•  p'(EIP)  points  to  return  address  p(RA); 

•  the  return  value  contained  in  p^(EAX)  (for  integers/pointers)  or 
p'(FPO)  (for floating-points)  is  related  to  v  by  j; 

Theorem  2.  Let  L  be  an  assembly  layer  interface  with  all  C-style 
primitives  preserving  memory  transformations.  Then,  for  any  M: 

IMjL  |[CompCertX(M)l|L 

More  details  can  be  found  in  the  companion  technical  report. 

6.2  Linking  compiled  code  with  assembly  code 

Contrary  to  traditional  separate  compilation,  we  target  compiling 
ClightX  functions  that  may  be  called  by  LAsm  assembly  code.  Since 
the  caller  may  be  arbitrary  LAsm  code,  not  necessarily  well-behaved 
code  written  in  or  compiled  from  ClightX,  we  have  to  assume  that 
the  memory  we  are  given  follows  LAsm  layout.  When  reasoning 
about  memory  states  that  involve  compiled  code,  we  then  have  to 
accommodate  memory  injections  introduced  by  the  compiler. 

During  a  whole-machine  refinement  proof,  the  two  memory 
states  of  the  overlay  and  the  underlay  are  related  with  a  simulation 
relation  R.  However,  consider  when  the  higher  (LAsm)  code  calls 
an  overlay  primitive,  that,  in  the  underlay,  is  compiled  from  ClightX. 
Because  during  the  per-primitive  simulation  proofs  we  ignored  the 
effects  of  the  compiler,  the  memory  injection  introduced  by  the 
compiler  may  become  a  source  of  discrepancy.  That  is  why  we 
encapsulate,  in  R,  a  memory  injection  between  the  higher  memory 
state  and  the  lower  memory  state.  This  injection  is  identity  until 
the  lower  state  calls  a  compiled  ClightX  function.  Then,  at  every 
such  call,  the  layer  simulation  relation  R  can  “absorb”  compilation 
refinement  on  its  right-hand  side: 

Lemma  2.  If  L'  and  Lc  are  C  overlays  and  LAsm  is  an  assembly 
underlay,  with  L  Lq  and  Lc  LAsm,  then  L  LAsm- 

Proof.  If  R  encapsulates  a  memory  injection  jo,  and  compilation 
introduces  a  memory  injection  j,  then,  the  simulation  relation  R 
will  still  hold  with  the  composed  memory  injection  j  o  Jq.  □ 

Summary  of  the  refinement  proof  with  compilation  and  linking 
Finally,  the  outline  of  proving  layer  refinement  Li  \-  M  :  L2, 
where  M  =  CompCertX(Mc)  ©  Mashi  is  the  union  of  a  compiled 
ClightX  module  and  an  LAsm  module,  is  summarized  in  the 
following  steps,  also  shown  in  Fig.|13| 

1.  Split  the  overlay  L2  into  two  layer  interfaces  L2,c  and  L2,Asm 
where  L2,c  is  a  C  layer  interface  containing  primitive  specifica¬ 
tions  to  be  implemented  by  ClightX  code  (necessarily  C-style) 
and  1/2, Asm  is  an  assembly  layer  interface  containing  all  other 
primitives  (implemented  in  LAsm),  so  that  L2  =  I/2,c  ©  L2,Asm. 

2.  For  each  such  part  of  the  overlay,  design  an  intermediate  layer 
interface  Lj  c  and  L\  Asm  with  the  same  abstract  state  type  as 
Li  (see  Section  4.3|l,  and  prove  L2,c  Li.c  L2,Asm 

L'l  Asm  independently  of  the  implementation. 

3.  For  both  intermediate  layer  interfaces,  prove  that  they  are  imple¬ 

mented  by  modules  Me  and  MAsm  on  top  of  Li  respectively,  i.e. 
L'i,c  ^id  IMcjLi  and  ^id  [[.^Asm]]  1^1  ■ 

4.  Then,  compile  Me:  IMcjLi  |[CompCertX(Mc)II/i. 
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Figure  13.  Proof  steps  of  layer  refinement  Li  hi?  M  :  L2 

5.  Using  LLe-Trans  and  LLe-Mon  to  combine  2.  and  3.,  we  have: 
L2,C  ©  F/2,Asm  Li  e  ©  L^^Asm  ^id  |Mc]]F/i  ©  [[MAsmJLl 
On  the  C  side  (left  of©),  Lemmaj^shows  that  absorbs 

By  4.:  L2,c  ©  F/2,Asm  |[CompCertX(Mc)|I/i  ©  |[MAsm]Li 

6.  From  the  soundness  of  Hcomp  (proof  in  TR  GU),  and  because 
M  =  CompCertX(Mc)  ©  MAsm,  we  have: 

ICompCertX(Mc)ILi  ©  IMas^Li  ^iid  IMjLr 

7.  Finally,  by  combining  5.  and  6.,  we  have  L2,c  ©  L2,Asm 
|[M]|Li.  Since  L2  =  L2,c  ©L2, Asm,  by  using  LLe-Ub-Left  and 
LLe-Comm,  we  have:  L2  |[M|I/i  |M|Li  ©  Li 

Li  ©  |[M|Li,  thus  we  get  Li  \-r  M  :  L2. 

1.  Case  study:  certified  OS  kernels 

To  demonstrate  the  power  of  our  new  languages  and  tools,  we  have 
applied  our  new  layered  approach  to  specify  and  verify  four  variants 
of  mCertiKOS  kernels  in  the  Coq  proof  assistant.  This  section 
describes  these  kernels  and  the  benefits  of  the  approach. 

The  mCertiKOS  base  kernel  is  a  simplified  uniprocessor  version 
of  the  CertiKOS  kernel  fT^  designed  for  the  32  bit  x86  architecture. 
It  provides  a  multi-process  environment  for  user-space  applications 
using  separate  virtual  address  space,  where  the  communications 
between  different  applications  are  established  by  message  passing. 
The  mCertiKOS-hyp  kernel,  built  on  top  of  the  base  kernel,  is  a 
realistic  hypervisor  kernel  that  can  boot  recent  versions  of  unmod¬ 
ified  Linux  operating  systems  (Debian  6.0.6  and  Ubuntu  12.04.2). 
The  mCertiKOS-rz  kernel  extends  the  hypervisor  supporting  “ring 
0”  processes,  hosting  “certifiably  safe”  services  and  application 
programs  inside  the  kernel  address  space.  Finally,  we  strip  the  last 
kernel  down  to  the  mCertiKOS-emb  kernel,  removing  virtualization, 
virtual  memory,  and  user-space  interrupt  handling.  This  results  in  a 
minimal  operating  system  suitable  for  embedded  environments. 

The  layer  structures  of  these  kernels  are  shown  in  the  top  half 
of  Fig.[^  each  block  in  the  top  half  represents  a  collection  of  sub¬ 
layers  shown  in  the  bottom  half  (as  we  zoom  in  on  mCertiKOS-hyp). 

mCertiKOS  The  layered  approach  is  the  key  to  our  success  in  fully 
certifying  a  kernel.  In  Sec.|4.3|  we  have  shown  how  to  define  getters 
and  setters  for  abstract  data  types  like  those  in  Fig.  allowing 
higher  layers  to  manipulate  abstract  states.  Furthermore,  layering 
is  also  crucial  to  certification  of  thread  queues  as  discussed  in 
Sec.|^  Instead  of  directly  proving  that  a  C  linked-list  implements  a 
functional  list,  we  insert  an  intermediate  layer  as  shown  in  Fig.[^to 
divide  the  difficult  task  into  two  steps. 

These  may  look  like  mere  proof  techniques  for  enabling  abstract 
states  or  reducing  proof  effort,  but  they  echo  the  following  mantra 
which  makes  our  certification  more  efficient  and  scalable: 
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Figure  14.  Various  mCertiKOS  layer  structures.  Layer  short-hands: 
TRAP:  interrupt  handling;  VIRT:  virtualization;  PROC:  process 
management;  THR:  thread  management;  VM:  virtual  memory;  MM: 
physical  memory  management. 

Abstract  in  minimal  steps,  specify  full  behavior,  and  hide  all 

underlying  details. 

This  is  also  how  we  prove  the  overall  contextual  correctness  guar¬ 
antees  for  all  system  calls  and  interrupt  handlers.  Fig.[T^shows  the 
call  graph  of  the  page  fault  handler,  including  all  functions  called 
both  directly  and  indirectly.  Circles  indicate  functions,  solid  arrows 
mean  primitive  invocations,  and  faint  dashed  lines  are  primitives 
that  are  translated  by  all  the  layers  they  pass  through. 

Defined  in  TSysCall  layer  interface,  the  page  fault  handler  makes 
use  of  proc_exit  and  proc_start,  both  defined  in  PProcd  layer 
interface.  Since  the  invocations  of  them  are  separated  by  other 
primitive  calls,  one  may  expect  that  the  invariants  need  to  be  re¬ 
established  or  the  effects  of  the  in-between  calls  re-interpreted. 
Fortunately,  as  our  mantra  suggests,  when  the  in-between  layers 
translate  the  two  primitives  to  TTrap  layer  interface,  the  behaviors 
of  them  we.  fully  specified  in  terms  of  TTrap’s  abstract  states,  and 
the  invariants  of  PProc  layer  interface  are  considered  the  underlying 
details  and  have  all  been  hidden.  This  is  especially  important  for 
calls  like  proc_exit  to  ikern_set  which  span  over  20  layers  with  the 
abstract  states  so  different  that  direct  translation  is  not  feasible. 

Finally,  kernel  initialization  is  another  difficult  task  that  has  been 
missing  from  other  kernel  verification  projects.  The  traditional 
kernel  initialization  process  is  not  compatible  with  “specify  full 
behavior  and  hide  all  underlying  details."  For  example,  start_kernel 
in  Linux  kernel  makes  a  sequence  of  calls  to  module  initializations. 
mCertiKOS ’s  initialization  (see  its  call  graph  in  Fig.|16|l  is  a  chain  of 
calls  to  layer  initializations;  this  pattern  complies  with  the  guideline 
that  initializing  one  layer  should  hide  the  detail  about  initializing  the 
lower  layers.  Without  layering,  the  specifications  of  all  functions 
will  be  populated  with  initialization  flags  for  each  module  they 
depend  on.  This  makes  encapsulation  harder  and  could  also  lead  to 
a  quadratic  blowup  in  size  and  proving  effort. 

mCertiKOS-hyp  The  mCertiKOS-hyp  kernel  provides  core  primi¬ 
tives  to  build  full-fledged  user-level  hypervisors  by  supporting  one  of 
the  two  popular  hardware  virtualization  technologies  -  AMD  S  VM. 
The  primitives  include  the  operations  for  manipulating  the  virtual 
machine  status,  handling  VMEXITs,  starting  or  stopping  a  virtual 
machine,  etc.  The  details  of  virtualization,  e.g.,  the  virtual  machine 
control  block  and  the  nested  page  table,  are  hidden  from  the  guest 
applications.  The  hypervisor  functionalities  are  implemented  in  nine 
layers  and  then  inserted  in  between  process  management  and  inter¬ 
rupt  handling  layers.  The  layered  approach  allows  us  to  do  so  while 

(1)  only  modeling  virtualization-specific  structures  when  needed; 

(2)  retaining  primitives  in  the  layer  interface  PProc  by  systematic 
lifting;  and  (3)  adding  new  primitives  (including  a  new  initialization 
function)  guaranteed  not  to  interfere  with  existing  primitives. 
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Figure  15.  Call  graph  of  the  page  fault  handler 
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Figure  16.  Call  graph  of  mCertiKOS  initializer 


mCertiKOS-rz  The  mCertiKOS-rz  kernel  explores  a  different 
dimension — instead  of  adding  intermediate  layers,  we  augmented 
a  few  existing  layers  (in  mCertiKOS-hyp)  with  support  of  ring  0 
processes.  The  main  modification  is  at  PProc,  where  an  additional 
kind  of  threads  is  defined.  However,  all  the  layers  between  PProc 
and  TSysCall  also  need  to  be  extended  to  expose  the  functionality 
as  system  calls.  Thankfully,  since  all  the  new  primitives  are  already 
described  in  deep  specifications,  lifting  them  to  system  calls  only 
requires  equality  reasoning  in  Coq. 

inCertiKOS-emb  The  mCertiKOS-emb  kernel  cuts  features  down 
to  a  bare  minimum:  it  does  not  switch  to  user  mode,  hence  does  not 
require  memory  protection  and  does  not  provide  system  call  inter¬ 
faces.  This  requires  removing  features  instead  of  adding  them.  Since 
the  layered  structure  minimizes  entanglements  by  eliminating  un¬ 
necessary  dependencies  and  code  coupling,  the  removal  process  was 
relatively  easy  and  straightforward.  Moreover,  removing  the  top  12 
layers  requires  no  additional  specifications  for  those  now  top-level 
primitives — deep  specifications  are  suitable  for  both  internal  rea¬ 
sonings  and  external  descriptions.  Thread  and  process  management 
layers  now  sit  directly  on  top  of  physical  memory  management; 
virtual  memory  is  never  enabled.  The  layers  remain  largely  the  same 
barring  the  removal  of  primitives  mentioning  page  tables. 

Evaluation  and  limitations  The  planning  and  development  of 
mCertiKOS  took  9.5  person  months  plus  2  person  months  on  linking 
and  code  extraction.  With  the  infrastructure  in  place,  mCertiKOS- 
hyp  only  took  1.5  person  months  to  finish,  and  mCertiKOS-rz  and 
mCertiKOS-emb  take  half  a  person  month  each.  The  kernels  are 
written,  layer  by  layer,  in  LAsm  and  ClightX  abstract  syntaxes  along 
with  driver  functions  specifying  how  to  compose  (link)  them.  All 
of  those  are  in  Coq  for  the  proofs  to  refer  to.  We  utilize  Coq’s  code 
extraction  to  get  an  OCaml  program  which  contains  CompCertX, 
the  abstract  syntax  trees  of  the  kernels,  and  the  driver  functions, 
which  invoke  CompCertX  on  pieces  of  ClightX  code  and  generate 
the  full  assembly  file.  The  output  of  the  OCaml  program  is  then  fed 
to  an  assembler  to  produce  the  kernel  executable. 

With  the  device  drivers  (running  as  user  processes)  and  a  cooper¬ 
ative  scheduler,  most  of  the  benchmarks  in  Imbench  are  under  2x 
slowdown  running  in  mCertiKOS-hyp,  well  within  expected  over- 


head.  Ring  0  processes,  not  used  in  the  above  experiment,  can  easily 
lower  the  number  as  we  measured  one  to  two  orders  of  magnitude 
reduction  in  the  number  of  cycles  needed  to  serve  system  calls. 

Because  the  proof  was  originally  developed  directly  in  terms  of 
abstract  machines  and  program  transformations,  the  current  code 
base  does  not  yet  reflect  the  calculus  presented  in  Sec.  in  its 
entirety.  Notably,  vertical  composition  is  done  at  the  level  of  the 
whole-machine  contextual  refinements  obtained  by  applying  the 
soundness  theorem  to  each  individual  abstraction  layer. 

Outside  our  verified  kernels  (mCertiKOS-hyp  consists  of  about 
3000  lines  of  C  and  assembly),  there  are  300  lines  of  C  and  170  lines 
of  x86  assembly  code  that  are  not  verified  yet:  the  preinit  procedure, 
the  ELF  loader  used  by  user  process  creation,  and  functions  such  as 
memcpy  which  currently  cannot  be  verified  because  of  a  limitation 
arising  from  the  CompCert  memory  model.  Device  drivers  are 
not  verified  because  LAsm  lacks  device  models  for  expressing 
the  correctness  statement.  Finally,  the  CompCert  assembler  for 
converting  LAsm  into  machine  code  remains  unverified. 

8.  Related  work 

Hoare-style  program  verification  Hoare  logic  (T4)  and  its  mod¬ 
em  variants  03 13(23  were  introduced  to  prove  strong  (partial 
or  total)  correctness  properties  of  programs  annotated  with  pre- 
and  postconditions.  A  total-correctness  Hoare  triple  [P]C[Q]  often 
means  a  refinement  between  the  implementation  C  and  the  speci¬ 
fication  [P,  Q]:  given  any  state  S,  if  the  precondition  P(S)  holds, 
then  the  command  C  can  run  safely  and  terminate  with  a  state  that 
satisfies  Q.  Though  not  often  done,  it  is  also  possible  to  introduce 
auxiliary/ghost  states  to  serve  as  “abstract  states”  and  prove  that  a 
program  implements  a  specification  via  a  simulation. 

Our  layer  language  can  be  viewed  as  a  novel  way  of  imposing 
a  module  system  over  Hoare-style  program  verification.  We  insist 
on  using  interfaces  with  deep  specifications  and  we  address  the 
“conflicting  abstract  states”  problem  mentioned  in  Sec.|^  Traditional 
program  verification  does  not  always  use  deep  specification  (for  pre- 
and  post-conditions)  so  the  module  interfaces  (e.g.,  [P,  Q])  may 
allow  some  safe  but  unwanted  behaviors.  Such  gap  is  fine  if  the  goal 
is  to  just  prove  safety  (as  in  static  type-checking),  but  if  we  want 
to  prove  the  strong  contextual  correctness  property  across  module 
boundaries,  it  is  important  that  each  interface  accurately  describes 
the  functionality  and  scope  of  the  underlying  implementation. 

In  addition  to  the  obvious  benefits  on  compositionality,  our 
layered  approach  also  enables  a  new  powerful  way  of  combining 
programming-  and  specification  languages  in  a  single  setting.  Each 
layer  interface  enables  a  new  programming  language  at  a  specific 
abstraction  level,  which  is  then  used  to  implement  layers  at  even 
higher  levels.  As  we  move  up  the  layer  hierarchy,  our  programming 
language  gets  closer  and  closer  to  the  specification  language — it 
can  call  primitives  at  higher  abstraction  levels  but  it  still  supports 
general-purpose  programming  (e.g.,  in  ClightX). 

Interestingly,  we  did  not  need  to  introduce  any  program  logic 
to  verify  our  OS  kernel  code.  Instead,  we  verify  it  directly  using 
the  ClightX  (or  LAsm)  language  semantics  (which  is  already  conve¬ 
niently  parameterized  over  a  layer  interface).  In  fact,  unlike  Hoare 
logic  which  shows  that  a  program  (e.g.,  C)  refines  a  specification 
(e.g.,  [P,  Q]),  we  instead  show  there  is  a  downward  simulation  from 
the  specification  to  the  program.  As  in  CompCert,  we  found  this 
easier  to  prove  and  we  can  do  this  because  both  our  specification 
and  language  semantics  are  deterministic  relative  to  external  events. 

Stepwise  program  refinement  Dijkstra  (9)  proposed  to  “realize” 
a  complex  program  by  decomposing  it  into  a  hierarchy  of  linearly 
ordered  “abstract  machines.”  Based  on  this  idea,  the  PSOS  team  at 
SRI  (27]  developed  the  Hierarchical  Development  Methodology 
(HDM)  and  applied  HDM  to  design  and  specify  an  OS  using 


20  hierarchically  organized  modules.  HDM  was  difficult  to  be 
rigorously  applied  in  practice,  probably  because  of  the  lack  of 
powerful  specification  and  proof  tools.  In  this  paper,  we  advance  the 
HDM  paradigm  by  using  a  new  formal  layer  language  to  connect 
multiple  layers  and  by  implementing  all  certified  layers  and  proofs 
in  a  modern  proof  assistant.  We  also  pursued  decomposition  more 
aggressively  since  it  made  our  verification  task  much  easier. 

Morgan’s  refinement  calculus  (25]  is  a  formalized  approach  to 
Dijkstra’s  stepwise  refinement.  Using  this  calculus,  a  high-level  spec¬ 
ification  can  be  refined  through  a  series  of  correctness-preserving 
transformations  and  eventually  turned  into  an  efficient  executable. 
Our  work  imposes  a  new  layer  language  to  enhance  compositional 
reasoning.  We  use  ClightX  (or  LAsm)  and  the  Coq  logic  as  our 
“refinement”  language,  and  use  a  certified  layer  (with  deep  specifica¬ 
tion)  to  represent  each  such  correctness-preserving  transformation. 
All  our  ClightX  and  LAsm  instances  have  executable  semantics  and 
can  be  compiled  and  linked  using  our  new  CompCertX  compiler. 

Separate  compilation  for  CompCert  Compositional  compiler  cor¬ 
rectness  is  an  extremely  challenging  problem  Eiiia,  especially 
when  it  involves  an  open  compiler  with  multiple  languages  1291. 
In  the  context  of  CompCert,  a  recent  proposal  d  aims  to  tackle 
the  full  Clight  language  but  it  has  not  been  fully  implemented  in 
the  CompCert  compiler.  While  our  CompCertX  compiler  proves  a 
stronger  correctness  theorem  for  each  ClightX  layer,  the  ClightX 
language  is  subtly  different  from  the  original  full-featured  Clight 
language.  Within  each  ClightX  layer,  all  locally  allocated  memory 
blocks  (e.g.,  stack  frames)  cannot  be  updated  by  functions  defined 
in  another  layer.  This  means  that  ClightX  does  not  support  the  same 
general  “stack-allocated  data  structures”  as  in  Clight.  This  is  fine 
for  our  OS  kernels  since  they  do  not  allocate  any  data  structures  on 
stack,  but  it  means  that  CompCertX  can  not  be  regarded  as  a  full 
featured  separate  compiler  for  CompCert. 

OS  kernel  verification  The  seL4  team  (13  were  the  first  to  build  a 
proof  of  functional  correctness  for  a  realistic  microkernel.  The  seL4 
work  is  impressive  in  that  all  the  proofs  were  done  inside  a  modern 
mechanized  proof  assistant.  They  have  shown  that  the  behaviors  of 
7500  lines  of  their  C  code  always  follow  an  abstract  specification 
of  their  kernel.  To  make  verification  easier,  they  introduced  an 
intermediate  executable  specification  to  hide  C  specifics.  Both  their 
abstract  and  executable  specifications  are  “monolithic”  as  they  are 
not  divided  into  layers  to  support  abstraction  among  different  kernel 
modules.  These  kernel  interdependencies  led  to  more  complex 
invariants  which  may  explain  why  their  effort  took  1 1  person  years. 

The  initial  seL4  effort  was  done  completely  at  the  C  level  so 
it  does  not  support  many  assembly  level  features  such  as  address 
translation.  This  also  made  verification  of  assembly  code  and  kernel 
initialization  difficult  (1200  lines  of  C  and  500  lines  of  assembly  are 
still  unverified).  It  is  also  unclear  how  to  use  their  verified  kernel 
to  reason  about  user-level  programs  since  they  would  be  running  in 
a  different  address  space.  Our  certified  kernels,  on  the  other  hand, 
directly  model  assembly-level  machines  that  support  all  kemel/user 
and  host/guest  programs.  Memory  access  to  a  user-level  address 
space  must  go  through  a  page  table,  and  memory  access  in  a  guest 
virtual  machine  must  go  through  a  nested  page  table.  We  thus  had 
no  problem  verifying  our  kernel  initialization  or  assembly  code. 

Modular  verification  of  low-level  code  Vaynberg  and  Shao  (361 
also  used  a  layered  approach  to  verify  a  small  virtual  memory 
manager.  Their  layers  are  not  linearly  ordered;  instead,  their  seven 
abstract  machines  form  a  DAG  with  potential  upcalls  (i.e.,  calls 
from  a  lower  layer  to  upper  ones).  As  a  result,  their  initialization 
function  (an  upcall)  was  much  harder  to  verify.  Their  refinement 
proofs  between  layers  are  insensitive  to  termination,  from  which 
they  can  only  prove  partial  correctness  but  not  the  strong  contextual 
correctness  property  which  we  prove  in  our  current  work. 
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Feng  et  al.  (m  developed  OCAP,  an  open  framework  for 
linking  components  verified  in  different  domain-specific  program 
logics.  They  verified  a  thread  library  with  hardware  interrupts  and 
preemption  Go)  using  a  variant  of  concurrent  separation  logic  ||2^. 
They  decomposed  the  thread  implementation  into  a  sequential  layer 
(with  interrupts  disabled)  and  a  concurrent  layer  (with  interrupts 
enabled).  Chlipala  (8)  developed  Bedrock,  an  automated  Coq  library 
to  support  verified  low-level  programming.  All  these  systems  aimed 
to  prove  partial  correctness  only,  so  they  are  quite  different  from 
the  layered  simulation  proofs  given  in  this  paper. 

9.  Conclusions 

Abstraction  layers  are  key  techniques  used  in  building  large-scale 
computer  software  and  hardware.  In  this  paper,  we  have  presented 
a  novel  language-based  account  of  abstraction  layers  and  shown 
that  they  are  particularly  suitable  for  supporting  abstraction  over 
deep  specifications,  which  is  essential  for  compositional  verification 
of  strong  correctness  properties.  We  have  designed  a  new  layer 
language  and  imposed  it  on  two  different  core  languages  (ClightX 
and  LAsm).  We  have  also  built  a  verified  compiler  from  ClightX  to 
LAsm.  By  aggressively  decomposing  each  complex  abstraction 
into  smaller  abstraction  steps,  we  have  successfully  developed 
several  certified  OS  kernels  that  prove  deeper  properties  (contextual 
correctness),  contain  smaller  trusted  computing  bases  (all  code 
verified  at  the  assembly  level),  require  significantly  less  effort  (3000 
lines  of  C  and  assembly  code  proved  in  less  than  1  person  year), 
and  demonstrate  strong  support  for  extensibility  (layers  are  heavily 
reused  in  different  certified  kernels).  We  expect  that  both  deep 
specifications  and  certified  abstraction  layers  will  become  critical 
technologies  and  important  building  blocks  for  developing  large- 
scale  certified  system  infrastructures  in  the  future. 
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Abstract.  Static  analysis  of  the  evaluation  cost  of  programs  is  an  exten¬ 
sively  studied  problem  that  has  many  important  applications.  However, 
most  automatic  methods  for  static  cost  analysis  are  limited  to  sequential 
evaluation  while  programs  are  increasingly  evaluated  on  modern  multicore 
and  multiprocessor  hardware.  This  article  introduces  the  first  automatic 
analysis  for  deriving  bounds  on  the  worst-case  evaluation  cost  of  parallel 
first-order  functional  programs.  The  analysis  is  performed  by  a  novel 
type  system  for  amortized  resource  analysis.  The  main  innovation  is  a 
technique  that  separates  the  reasoning  about  sizes  of  data  structures 
and  evaluation  cost  within  the  same  framework.  The  cost  semantics  of 
parallel  programs  is  based  on  call-by-value  evaluation  and  the  standard 
cost  measures  work  and  depth.  A  soundness  proof  of  the  type  system 
establishes  the  correctness  of  the  derived  cost  bounds  with  respect  to  the 
cost  semantics.  The  derived  bounds  are  multivariate  resource  polynomials 
which  depend  on  the  sizes  of  the  arguments  of  a  function.  Type  inference 
can  be  reduced  to  linear  programming  and  is  fully  automatic.  A  prototype 
implementation  of  the  analysis  system  has  been  developed  to  experimen¬ 
tally  evaluate  the  effectiveness  of  the  approach.  The  experiments  show 
that  the  analysis  infers  bounds  for  realistic  example  programs  such  as 
quick  sort  for  lists  of  lists,  matrix  multiplication,  and  an  implementation 
of  sets  with  lists.  The  derived  bounds  are  often  asymptotically  tight  and 
the  constant  factors  are  close  to  the  optimal  ones. 

Keywords:  Functional  Programming,  Static  Analysis,  Resource  Con¬ 
sumption,  Amortized  Analysis 


1  Introduction 

Static  analysis  of  the  resource  cost  of  programs  is  a  classical  subject  of  computer 
science.  Recently,  there  has  been  an  increased  interest  in  formally  proving  cost 
bounds  since  they  are  essential  in  the  verification  of  safety-critical  real-time  and 
embedded  systems. 

For  sequential  functional  programs  there  exist  many  automatic  and  semi¬ 
automatic  analysis  systems  that  can  statically  infer  cost  bounds.  Most  of  them 
are  based  on  sized  types  [1],  recurrence  relations  [2],  and  amortized  resource 
analysis  [3,4].  The  goal  of  these  systems  is  to  automatically  compute  easily- 
understood  arithmetic  expressions  in  the  sizes  of  the  inputs  of  a  program  that 
bound  resource  cost  such  as  time  or  space  usage.  Even  though  an  automatic 
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computation  of  cost  bounds  is  undecidable  in  general,  novel  analysis  techniques  are 
able  to  efficiently  compute  tight  time  bounds  for  many  non-trivial  programs  [5-9] . 

For  functional  programs  that  are  evaluated  in  parallel,  on  the  other  hand, 
no  such  analysis  system  exists  to  support  programmers  with  computer-aided 
derivation  of  cost  bounds.  In  particular,  there  are  no  type  systems  that  derive 
cost  bounds  for  parallel  programs.  This  is  unsatisfying  because  parallel  evalu¬ 
ation  is  becoming  increasingly  important  on  modern  hardware  and  referential 
transparency  makes  functional  programs  ideal  for  parallel  evaluation. 

This  article  introduces  an  automatic  type-based  resource  analysis  for  deriving 
cost  bounds  for  parallel  first-order  functional  programs.  Automatic  cost  analysis 
for  sequential  programs  is  already  challenging  and  it  might  seem  to  be  a  long  shot 
to  develop  an  analysis  for  parallel  evaluation  that  takes  into  account  low-level 
features  of  the  underlying  hardware  such  as  the  number  of  processors.  Fortunately, 
it  has  been  shown  [10, 11]  that  the  cost  of  parallel  functional  programs  can  be 
analyzed  in  two  steps.  First,  we  derive  cost  bounds  at  a  high  abstraction  level 
where  we  assume  to  have  an  unlimited  number  of  processors  at  our  disposal. 
Second,  we  prove  once  and  for  all  how  the  cost  on  the  high  abstraction  level 
relates  to  the  actual  cost  on  a  specific  system  with  limited  resources. 

In  this  work,  we  derive  bounds  on  an  abstract  cost  model  that  consists  of 
the  work  and  the  depth  of  an  evaluation  of  a  program  [10].  Work  measures 
the  evaluation  time  of  sequential  evaluation  and  depth  measures  the  evaluation 
time  of  parallel  evaluation  assuming  an  unlimited  number  of  processors.  It  is 
well-known  [12]  that  a  program  that  evaluates  to  a  value  using  work  w  and  depth 
d  can  be  evaluated  on  a  shared-memory  multiprocessor  (SMP)  system  with  p 
processors  in  time  0(inax{w/p,d))  (see  Section  2.3).  The  mechanism  that  is  used 
to  prove  this  result  is  comparable  to  a  scheduler  in  an  operating  system. 

A  novelty  in  the  cost  semantics  in  this  paper  is  the  definition  of  work  and 
depth  for  terminating  and  non-terminating  evaluations.  Intuitively,  the  non- 
deterministic  big-step  evaluation  judgement  that  is  defined  in  Section  2  expresses 
that  there  is  a  (possibly  partial)  evaluation  with  work  n  and  depth  m.  This 
statement  is  used  to  prove  that  a  typing  derivation  for  bounds  on  the  depth  or 
for  bounds  on  the  work  ensures  termination. 

Technically,  the  analysis  computes  two  separate  typing  derivations,  one  for  the 
work  and  one  for  the  depth.  To  derive  a  bound  on  the  work,  we  use  multivariate 
amortized  resource  analysis  for  sequential  programs  [13].  To  derive  a  bound 
on  the  depth,  we  develop  a  novel  multivariate  amortized  resource  analysis  for 
programs  that  are  evaluated  in  parallel.  The  main  challenge  in  the  design  of 
this  novel  parallel  analysis  is  to  ensure  the  same  high  compositionality  as  in 
the  sequential  analysis.  The  design  and  implementation  of  this  novel  analysis 
for  bounds  on  the  depth  of  evaluations  is  the  main  contribution  of  our  work. 
The  technical  innovation  that  enables  compositionality  is  an  analysis  method 
that  separates  the  static  tracking  of  size  changes  of  data  structures  from  the 
cost  analysis  while  using  the  same  framework.  We  envision  that  this  technique 
will  find  further  applications  in  the  analysis  of  other  non-additive  cost  such  as 
stack-space  usage  and  recursion  depth. 
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We  describe  the  new  type  analysis  for  parallel  evaluation  for  a  simple  first- 
order  language  with  lists,  pairs,  pattern  matching,  and  sequential  and  parallel 
composition.  This  is  already  sufficient  to  study  the  cost  analysis  of  parallel 
programs.  However,  we  implemented  the  analysis  system  in  Resource  Aware  ML 
(RAML) ,  which  also  includes  other  inductive  data  types  and  conditionals  [14] .  To 
demonstrate  the  universality  of  the  approach,  we  also  implemented  NESL’s  [15] 
parallel  list  comprehensions  as  a  primitive  in  RAML  (see  Section  6) .  Similarly,  we 
can  define  other  parallel  sequence  operations  of  NESL  as  primitives  and  correctly 
specify  their  work  and  depth.  RAML  is  currently  extended  to  include  higher-order 
functions,  arrays,  and  user-defined  inductive  types.  This  work  is  orthogonal  to 
the  treatment  of  parallel  evaluation. 

To  evaluate  the  practicability  of  the  proposed  technique,  we  performed  an 
experimental  evaluation  of  the  analysis  using  the  prototype  implementation  in 
RAML.  Note  that  the  analysis  computes  worst-case  bounds  instead  of  average- 
case  bounds  and  that  the  asymptotic  behavior  of  many  of  the  classic  examples 
of  Blelloch  et  al.  [10]  does  not  differ  in  parallel  and  sequential  evaluations.  Eor 
instance,  the  depth  and  work  of  quick  sort  are  both  quadratic  in  the  worst-case. 
Therefore,  we  focus  on  examples  that  actually  have  asymptotically  different 
bounds  for  the  work  and  depth.  This  includes  quick  sort  for  lists  of  lists  in 
which  the  comparisons  of  the  inner  lists  can  be  performed  in  parallel,  matrix 
multiplication  where  matrices  are  lists  of  lists,  a  function  that  computes  the 
maximal  weight  of  a  (continuous)  sublist  of  an  integer  list,  and  the  standard 
operations  for  sets  that  are  implemented  as  lists.  The  experimental  evaluation 
can  be  easily  reproduced  and  extended:  RAML  and  the  example  programs  are 
publicly  available  for  download  and  through  an  user-friendly  online  interface  [16]. 
In  summary  we  make  the  following  contributions. 

1.  We  introduce  the  first  automatic  static  analysis  for  deriving  bounds  on  the 
depth  of  parallel  functional  programs.  Being  based  on  multivariate  resource 
polynomials  and  type-based  amortized  analysis,  the  analysis  is  compositional. 
The  computed  type  derivations  are  easily-checkable  bound  certihcates. 

2.  We  prove  the  soundness  of  the  type-based  amortized  analysis  with  respect 
to  an  operational  big-step  semantics  that  models  the  work  and  depth  of 
terminating  and  non-terminating  programs.  This  allows  us  to  prove  that 
work  and  depth  bounds  ensure  termination.  Our  inductively  defined  big-step 
semantics  is  an  interesting  alternative  to  coinductive  big-step  semantics. 

3.  We  implemented  the  proposed  analysis  in  RAML,  a  first-order  functional 
language.  In  addition  to  the  language  constructs  like  lists  and  pairs  that  are 
formally  described  in  this  article,  the  implementation  includes  binary  trees, 
natural  numbers,  tuples,  Booleans,  and  NESL’s  parallel  list  comprehensions. 

4.  We  evaluated  the  practicability  of  the  implemented  analysis  by  performing 
reproducible  experiments  with  typical  example  programs.  Our  results  show 
that  the  analysis  is  efficient  and  works  for  a  wide  range  of  examples.  The  de¬ 
rived  bounds  are  usually  asymptotically  tight  if  the  tight  bound  is  expressible 
as  a  resource  polynomial. 

The  full  version  of  this  article  [17]  contains  additional  explanations,  lemmas, 
and  details  of  the  technical  development. 
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2  Cost  Semantics  for  Parallel  Programs 

In  this  section,  we  introduce  a  first-order  functional  language  with  parallel  and 
sequential  composition.  We  then  dehne  a  big-step  operational  semantics  that 
formalizes  the  cost  measures  work  and  depth  for  terminating  and  non-terminating 
evaluations.  Finally,  we  prove  properties  of  the  cost  semantics  and  discuss  the 
relation  of  work  and  depth  to  the  run  time  on  hardware  with  finite  resources. 

2.1  Expressions  and  Programs 

Expressions  are  given  in  let-normal  form.  This  means  that  term  formers  are 
applied  to  variables  only  when  this  does  not  restrict  the  expressivity  of  the 
language.  Expressions  are  formed  by  integers,  variables,  function  applications, 
lists,  pairs,  pattern  matching,  and  sequential  and  parallel  composition. 

e.  Cl,  62  ::=  n  I  X  I  f{x)  \  (xi,  X2)  ||  match  x  with  (xi,  X2)  ^  e 

I  nil  I  cons(xi,X2)  |  match  x  with  (nil  ^  ei  |  cons(xi,X2)  ^  62) 

I  letx  =  61  in  62  I  parxi  =  61  and  X2  =  62  in  6 
The  parallel  composition  parxi  =  Ci  and  X2  =  62  in  e  is  used  to  evaluate  Ci  and 
62  in  parallel  and  bind  the  resulting  values  to  the  names  xi  and  X2  for  use  in  e. 

In  the  prototype,  we  have  implemented  other  inductive  types  such  as  trees, 
natural  numbers,  and  tuples.  Additionally,  there  are  operations  for  primitive 
types  such  as  Booleans  and  integers,  and  NESL’s  parallel  list  comprehensions  [15]. 
Expressions  are  also  transformed  automatically  into  let  normal  form  before  the 
analysis.  In  the  examples  in  this  paper,  we  use  the  syntax  of  our  prototype 
implementation  to  improve  readability. 

In  the  following,  we  dehne  a  standard  type  system  for  expressions  and  pro¬ 
grams.  Data  types  A,  B  and  function  types  F  are  dehned  as  follows. 

A,B  ::=  int  I  L(A)  I  F  ■.■=  A B 

Let  A  be  the  set  of  data  types  and  let  F  be  the  set  of  function  types.  A  signature 

5  :  EID  ^  is  a  partial  hnite  mapping  from  function  identihers  to  function 
types.  A  context  is  a  partial  hnite  mapping  F  :  For  ^  A  from  variable  identihers 
to  data  types.  A  simple  type  judgement  E]  F  \-  e  :  A  states  that  the  expression 

6  has  type  A  in  the  context  F  under  the  signature  E.  The  dehnition  of  typing 
rules  for  this  judgement  is  standard  and  we  omit  the  rules. 

A  (well-typed)  program  consists  of  a  signature  E  and  a  family  (e/,  yg) /gdomti:) 
of  expressions  6/  with  a  distinguished  variable  identiher  yj  such  that  E]yf.A  \- 
ef.B  if  A(/)  =  A-*  B. 

2.2  Big-Step  Operational  Semantics 

We  now  formalize  the  resource  cost  of  evaluating  programs  with  a  big-step 
operational  semantics.  The  focus  of  this  paper  is  on  time  complexity  and  we  only 
dehne  the  cost  measures  work  and  depth.  Intuitively,  the  work  measures  the  time 
that  is  needed  in  a  sequential  evaluation.  The  depth  measures  the  time  that  is 
needed  in  a  parallel  evaluation.  In  the  semantics,  time  is  parameterized  by  a 
metric  that  assigns  a  non-negative  cost  to  each  evaluation  step. 
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_  (E:Abort) 

V,H  I - lets;  =  ei  ine2  o  |  M''*+d)  - - 

H^ellol  (0,0) 

E,H  ei  JJ.  I  (wi,di)  y[a;  62  JJ.  p  I  (W2,d2) 

- ^ - ; - ; - (E:Let2) 

V,H  I -  let*  =  61  in  62  JJ.  p  I  (M°*+wi+W2,  M'^*+di+d2) 

E, //  6l  JJ.  pi  I  (wi,  dl)  E,  dd  62  JJ.  p2  I  (W2,d2)  Pl=0Vp2=0 

- jTT - (E:Par1) 

E,  dd  par  xi  =  6i  and  *2  =  62  in  6  (1  o  I  +W1+W2,  max(di,  d2)) 

(E:Par2) 

E,  dd  H^6i  JJ.  (dijddi)  I  (luijdi)  (w',  d')  =  (M’’^''+Wi+ui2+w,  Ad’’'"'+ max(di,  d2)+d) 

E,dd  62  (1  (d2,dd2)  I  (W2,d2)  E[*lH^dl,*2H^d2],ddlwdd2  6  (!(£,  ddQ  |  (w,  d) 
E,  H'  par  *1  =  6i  and  *2  =  62  in  6  JJ.  (d,  dd')  |  (w',  d') 

Fig.  1.  Interesting  rules  of  the  operational  big-step  semantics. 

Motivation.  A  distinctive  feature  of  our  big-step  semantics  is  that  it  models 
terminating,  failing,  and  diverging  evaluations  by  inductively  describing  finite 
subtrees  of  (possibly  infinite)  evaluation  trees.  By  using  an  inductive  judgement 
for  diverging  and  terminating  computations  while  avoiding  intermediate  states, 
it  combines  the  advantages  of  big-step  and  small-step  semantics.  This  has  two 
benefits  compared  to  standard  big-step  semantics.  First,  we  can  model  the  resource 
consumption  of  diverging  programs  and  prove  that  bounds  hold  for  terminating 
and  diverging  programs.  (In  some  cost  metrics,  diverging  computations  can  have 
finite  cost.)  Second,  for  a  cost  metric  in  which  all  diverging  computations  have 
infinite  cost  we  are  able  to  show  that  bounds  imply  termination. 

Note  that  we  cannot  achieve  this  by  step-indexing  a  standard  big-step  se¬ 
mantics.  The  available  alternatives  to  our  approach  are  small-step  semantics  and 
coinductive  big-step  semantics.  However,  it  is  unclear  how  to  prove  the  soundness 
of  our  type  system  with  respect  to  these  semantics.  Small-step  semantics  is 
difficult  to  use  because  our  type-system  models  an  intentional  property  that  goes 
beyond  the  classic  type  preservation:  After  performing  a  step,  we  have  to  obtain 
a  refined  typing  that  corresponds  to  a  (possibly)  smaller  bound.  Coinductive 
derivations  are  hard  to  relate  to  type  derivations  because  type  derivations  are 
defined  inductively. 

Our  inductive  big-step  semantics  can  not  only  be  used  to  formalize  resource 
cost  of  diverging  computations  but  also  for  other  effects  such  as  event  traces.  It  is 
therefore  an  interesting  alternative  to  recently  proposed  coinductive  operational 
big-step  semantics  [18]. 

Semantic  Judgements.  We  formulate  the  big-step  semantics  with  respect  to 
a  stack  and  a  heap.  Let  Loc  be  an  inhnite  set  of  locations  modeling  memory 
addresses  on  a  heap.  A  value  v  ::=  n  \  \  (cons, £1,^2)  |  nil  G  Val  is  either 

an  integer  n  e  Z,  a  pair  of  locations  (^1,^2),  a  node  (cons, £1, £2)  of  a  list,  or  nil. 

A  heap  is  a  finite  partial  mapping  H  :  Loc  Val  that  maps  locations  to 
values.  A  stack  is  a  finite  partial  mapping  E  :  Ear  ^  Loc  from  variable  identifiers 
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to  locations.  Thus  we  have  boxed  values.  It  is  not  important  for  the  analysis 
whether  values  are  boxed. 

Figure  1  contains  a  compilation  of  the  big-step  evaluation  rules  (the  full 
version  contains  all  rules).  They  are  formulated  with  respect  to  a  resource  metric 
M.  They  define  the  evaluation  judgment 

V,  H  e  Ij.  p  I  {w,  d)  where  p  ::=  {i,  H)  \  o  . 

It  expresses  the  following.  In  a  fixed  program  (e/, if  the  stack  V 
and  the  initial  heap  H  are  given  then  the  expression  e  evaluates  to  p.  Under  the 
metric  M ,  the  work  of  the  evaluation  of  e  is  rc  and  the  depth  of  the  evaluation 
is  d.  Unlike  standard  big-step  operational  semantics,  p  can  be  either  a  pair  of  a 
location  and  a  new  heap,  or  o  (pronounced  busy)  indicating  that  the  evaluation 
is  not  finished  yet. 

A  resource  metric  M  :  K  ^  Qj  defines  the  resource  consumption  in  each 
evaluation  step  of  the  big-step  semantics  with  a  non-negative  rational  number. 
We  write  for  M{k). 

An  intuition  for  the  judgement  V,H  e  (j.  o  |  (w,  d)  is  that  there  is  a 
partial  evaluation  of  e  that  runs  without  failure,  has  work  w  and  depth  d,  and 
has  not  yet  reached  a  value.  This  is  similar  to  a  small-step  judgement. 

Rules.  For  a  heap  H,  we  write  H,£  v  to  express  that  £  ^  dom(id)  and  to 
denote  the  heap  id'  such  that  H'{x)  =  H{x)  if  a;  G  dom(ii)  and  H'{£)  =  v. 
In  the  rule  E:Par2,  we  write  idi  ty  H2  to  indicate  that  Hi  and  H2  agree  on 
the  values  of  locations  in  dom(idi)  n  dom(ii2)  and  to  a  combined  heap  H  with 
dom(ii)  =  dom(iii)  u  dom(id2).  We  assume  that  the  locations  that  are  allocated 
in  parallel  evaluations  are  disjoint.  That  is  easily  achievable  in  an  implementation. 

The  most  interesting  rules  of  the  semantics  are  E:Abort,  and  the  rules 
for  sequential  and  parallel  composition.  They  allow  us  to  approximate  inhnite 
evaluation  trees  for  non-terminating  evaluations  with  finite  subtrees.  The  rule 
E:  Abort  states  that  we  can  partially  evaluate  every  expression  by  doing  zero 
steps.  The  work  w  and  depth  d  are  then  both  zero  (i.e.,  w  =  d  =  0). 

To  obtain  an  evaluation  judgement  for  a  sequential  composition  let  a:  =  ei  in  62 
we  have  two  options.  We  can  use  the  rule  E:Let1  to  partially  evaluate  ei  using 
work  w  and  depth  d.  Alternatively,  we  can  use  the  rule  E:Let2  to  evaluate  ei 
until  we  obtain  a  location  and  a  heap  {£,H')  using  work  wi  and  depth  di.  Then 
we  evaluate  62  using  work  W2  and  depth  d2-  The  total  work  and  depth  is  then 
given  by  M'^^+wi+W2  and  respectively. 

Similarly,  we  can  derive  evaluation  judgements  for  a  parallel  composition 
parxi  =  eianda;2  =  e2ine  using  the  rules  E:Par1  and  E:Par2.  In  the  rule 
E:Par1,  we  partially  evaluate  ei  or  €2  with  evaluation  cost  (■u;i,fii)  and  (^2,^2)- 
The  total  work  is  then  M^^'^~\-wi-\-W2  (the  cost  for  the  evaluation  of  the  parallel 
binding  plus  the  cost  for  the  sequential  evaluation  of  ei  and  62).  The  total  depth  is 
j^^Par-i-  2jjax((ii,  ^2)  (the  cost  for  the  evaluation  of  the  binding  plus  the  maximum 
of  the  cost  of  the  depths  of  ei  and  62).  The  rule  E:Par2  handles  the  case  in 
which  ei  and  62  are  fully  evaluated.  It  is  similar  to  E:Let2  and  the  cost  of  the 
evaluation  of  the  expression  e  is  added  to  both  the  cost  and  the  depth  since  e  is 
evaluated  after  ei  and  62. 
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2.3  Properties  of  the  Cost-Semantics 

The  main  theorem  of  this  section  states  that  the  resource  cost  of  a  partial 
evaluation  is  less  than  or  equal  to  the  cost  of  an  evaluation  of  the  same  expression 
that  terminates. 

Theorem  1.  IJV,H  e  \  {w,d)  and  V,  H  e  o  |  (w',d')  then 

w'  ^  w  and  d'  ^  d. 

Theorem  1  can  be  proved  by  a  straightforward  induction  on  the  derivation  of  the 
judgement  V,H  e  jj.  \  {w,d). 

Provably  Efficient  Implementations.  While  work  is  a  realistic  cost-model 
for  the  sequential  execution  of  programs,  depth  is  not  a  realistic  cost-model  for 
parallel  execution.  The  main  reason  is  that  it  assumes  that  an  infinite  number  of 
processors  can  be  used  for  parallel  evaluation.  However,  it  has  been  shown  [10] 
that  work  and  depth  are  closely  related  to  the  evaluation  time  on  more  realistic 
abstract  machines. 

For  example,  Brent’s  Theorem  [12]  provides  an  asymptotic  bound  on  the 
number  of  execution  steps  on  the  shared-memory  multiprocessor  (SMP)  machine. 
It  states  that  if  V,H  e  JJ.  \  {w,d)  then  e  can  be  evaluated  on  a  p- 

processor  SMP  machine  in  time  0{ma.x{w/p,d)).  An  SMP  machine  has  a  fixed 
number  p  of  processes  and  provides  constant-time  access  to  a  shared  memory.  The 
proof  of  Brent’s  Theorem  can  be  seen  as  the  description  of  a  so-called  provahly 
efficient  implementation^  that  is,  an  implementation  for  which  we  can  establish 
an  asymptotic  bound  that  depends  on  the  number  of  processors. 

Classically,  we  are  especially  interested  in  non-asymptotic  bounds  in  resource 
analysis.  It  would  thus  be  interesting  to  develop  a  non-asymptotic  version  of 
Brent’s  Theorem  for  a  specific  architecture  using  more  refined  models  of  concur¬ 
rency  [11].  However,  such  a  development  is  not  in  the  scope  of  this  article. 
Well-Formed  Environments  and  Type  Soundness.  For  each  data  type  A 
we  inductively  define  a  set  |A]  of  values  of  type  A.  Lists  are  interpreted  as  lists 
and  pairs  are  interpreted  as  pairs. 

[intl=Z  lA*Bj  =  lA}xlB} 

=  {[®li  •  •  •  5  I  n  G  N,  Oi  G  |A]} 

If  iL  is  a  heap,  f  is  a  location,  A  is  a  data  type,  and  a  G  |A]  then  we  write 
H  \=  £  1-^  a:  A  to  mean  that  £  defines  the  semantic  value  a  G  |A]  when  pointers 
are  followed  in  H  in  the  obvious  way.  The  judgment  is  formally  defined  in  the 
full  version  of  the  article. 

We  write  H  \=  £:  A  to  indicate  that  there  exists  a,  necessarily  unique,  semantic 
value  a  g  |A]  so  that  H  \=  £  a:  A .  A  stack  V  and  a  heap  H  are  well-formed 
with  respect  to  a  context  T  it  H  \=  V{x) :  r{x)  holds  for  every  x  G  dom(r).  We 
then  write  H  \=  V  :  T. 

Simple  Metrics  and  Progress.  In  the  reminder  of  this  section,  we  prove  a 
property  of  the  evaluation  judgement  under  a  simple  metric.  A  simple  metric  M 
assigns  the  value  1  to  every  resource  constant,  that  is,  M(a;)  =  1  for  every  x  e  K. 
With  a  simple  metric,  work  counts  the  number  of  evaluation  steps. 
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Theorem  2  states  that,  in  a  well-formed  environment,  well-typed  expressions 
either  evaluate  to  a  value  or  the  evaluation  uses  unbounded  work  and  depth. 

Theorem  2  (Progress).  Let  M  be  a  simple  metric,  S]  F  \-  e  :  B ,  and  H  \= 
V  :  r.  Then  V,  FI  e  FI')  \  (w,  d)  for  some  w,deN  or  for  every  n  eN 
there  exist  x,y  e  N  such  that  V,  FI  e  (J,  o  |  (cc,  n)  and  V,  FI  e  (J,  o  |  {n,y). 

A  direct  consequence  of  Theorem  2  is  that  bounds  on  the  depth  of  programs 
under  a  simple  metric  ensure  termination. 


3  Amortized  Analysis  and  Parallel  Programs 

In  this  section,  we  give  a  short  introduction  into  amortized  resource  analysis  for 
sequential  programs  (for  bounding  the  work)  and  then  informally  describe  the 
main  contribution  of  the  article:  a  multivariate  amortized  resource  analysis  for 
parallel  programs  (for  bounding  the  depth). 

Amortized  Resource  Analysis.  Amortized  resource  analysis  is  a  type-based 
technique  for  deriving  upper  bounds  on  the  resource  cost  of  programs  [3] .  The 
advantages  of  amortized  resource  analysis  are  compositionality  and  efficient 
type  inference  that  is  based  on  linear  programming.  The  idea  is  that  types  are 
decorated  with  resource  annotations  that  describe  a  potential  function.  Such 
a  potential  function  maps  the  sizes  of  typed  data  structures  to  a  non-negative 
rational  number.  The  typing  rules  ensure  that  the  potential  dehned  by  a  typing 
context  is  sufficient  to  pay  for  the  evaluation  cost  of  the  expression  that  is  typed 
under  this  context  and  for  the  potential  of  the  result  of  the  evaluation. 

The  basic  idea  of  amortized  analysis  is  best  explained  by  example.  Consider 
the  function  mult  :  int  *  L(int)  — >  L(int)  that  takes  an  integer  and  an  integer  list 
and  multiplies  each  element  of  the  list  with  the  integer. 

mult(x,ys)  =  match  ys  with  I  nil  ^  nil 

I  (y::ys’)  — >  x*y : :mult (x,ys ’ ) 

For  simplicity,  we  assume  a  metric  M*  that  only  counts  the  number  of  multipli¬ 
cations  performed  in  an  evaluation  in  this  section.  Then  V,F[  \  mult(x,  ys)  JJ. 

F[')  I  (n,  n)  for  a  well-formed  stack  V  and  heap  Lf  in  which  ys  points  to  a  list 
of  length  n.  In  short,  the  work  and  depth  of  the  evaluation  of  mult(x,  ys)  is  |ys|. 

To  obtain  a  bound  on  the  work  in  type-based  amortized  resource  analysis,  we 
derive  a  type  of  the  following  form. 

x:int,  ys:L(int);  Q  |  mult(x,  ys)  :  (L(int),(5') 

Here  Q  and  Q'  are  coefficients  of  multivariate  resource  polynomials  pg  :  |int  * 
L(int)]  — >  Qg  and  pq>  :  |L(int)]  ^  Qj  that  map  semantic  values  to  non-negative 
rational  numbers.  The  rules  of  the  type  system  ensure  that  for  every  evaluation 
context  (y.  Ft)  that  maps  x  to  a  number  m  and  ys  to  a  list  a,  the  potential 
PQ{m,a)  is  sufficient  to  cover  the  evaluation  cost  of  mult(x,  ys)  and  the  potential 
PQ'{a!)  of  the  returned  list  a' .  More  formally,  we  have  pgim,  a)  ^  w  +  pQ'{a')  if 
V,H  I  mult(x,  ys)  JJ.  {t,H')  \  {w,d)  and  t  points  to  the  list  a'  in  FT. 
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In  our  type  system  we  can  for  instance  derive  coefficients  Q  and  Q'  that 
represent  the  potential  functions 

PQ{n,a)  =  |a|  and  PQ'{.a)  =  0  . 

The  intuitive  meaning  is  that  we  must  have  the  potential  |ys|  available  when 
evaluating  mult(x,  ys).  During  the  evaluation,  the  potential  is  used  to  pay  for  the 
evaluation  cost  and  we  have  no  potential  left  after  the  evaluation. 

To  enable  compositionality,  we  also  have  to  be  able  to  pass  potential  to  the 
result  of  an  evaluation.  Another  possible  instantiation  of  Q  and  Q'  would  for 
example  result  in  the  following  potential. 

PQ{n,a)  =  2-\a\  and  PQ'(o)  =  hi 

The  resulting  typing  can  be  read  as  follows.  To  evaluate  mult(x,  ys)  we  need  the 
potential  2|ys|  to  pay  for  the  cost  of  the  evaluation.  After  the  evaluation  there  is 
the  potential  |mult(x, ys)|  left  to  pay  for  future  cost  in  a  surrounding  program. 
Such  an  instantiation  would  be  needed  to  type  the  inner  function  application  in 
the  expression  mult(x,  mult(z,  ys)). 

Technically,  the  coefficients  Q  and  Q'  are  families  that  are  indexed  by  sets 
of  base  polynomials.  The  set  of  base  polynomials  is  determined  by  the  type 
of  the  corresponding  data.  For  the  type  int  *  L(int),  we  have  for  example  Q  = 
{<?(*,[]):  9(*,[*]) :?(*,[*, *])>  ■  •  and  PQ(n,  a)  =  ?(*,[])  +g(*,[*])-|a|  +  <?(*_[*_*]) -(12')  + 
....  This  allows  us  to  express  multivariate  functions  such  as  m  ■  n. 

The  rules  of  our  type  system  show  how  to  describe  the  valid  instantiations  of 
the  coefficients  Q  and  Q'  with  a  set  of  linear  inequalities.  As  a  result,  we  can  use 
linear  programming  to  infer  resource  bounds  efficiently. 

A  more  in-depth  discussion  can  be  found  in  the  literature  [3, 19,  7]. 

Sequential  Composition.  In  a  sequential  composition  letx  =  eiine2,  the 
initial  potential,  defined  by  a  context  and  a  corresponding  annotation  (T,  Q), 
has  to  be  used  to  pay  for  the  work  of  the  evaluation  of  Ci  and  the  work  of  the 
evaluation  of  62-  Let  us  consider  a  concrete  example  again. 

mult2(ys)  =  let  xs  =  mult(496,ys)  in 

let  zs  =  mult (8128, ys)  in  (xs,zs) 

The  work  (and  depth)  of  the  evaluation  of  the  expression  mult2(ys)  is  2|ys|  in  the 
metric  M* .  In  the  type  judgement,  we  express  this  bound  as  follows.  First,  we 
type  the  two  function  applications  of  mult  as  before  using 

x:int,ys:L(int);Q  I  mult(x,  ys)  :  (L(int),(5') 
where  pQ{n,a)  =  |a|  and  PQ'(a)  =  0.  In  the  type  judgement 

ys:L(int);i?  |  mult2(ys)  :  (L(int)  *L(int),i?') 

we  require  that  PR^a)  ^  PQ(a)+PQ(a),  that  is,  the  initial  potential  (defined  by  the 
coefficients  R)  has  to  be  shared  in  the  two  sequential  branches.  Such  a  sharing  can 
still  be  expressed  with  linear  constraints,  such  as  r[,f]  ^  9(*,[*])  +  9(*.[*])-  A  valid 
instantiation  of  R  would  thus  correspond  to  the  potential  function  pr^o)  =  2|a|. 
With  this  instantiation,  the  previous  typing  reflects  the  bound  2|ys|  for  the 
evaluation  of  mult2(ys). 
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A  slightly  more  involved  example  is  the  function  dyad  :  L(int)  *  A(int)  — > 
L(L(int))  which  computes  the  dyadic  product  of  two  integer  lists, 
dyad  (u,v)  =  match  u  with  I  nil  — >  nil 
I  (x::xs)  — >  let  x’  =  mult(x,v)  in 

let  xs’  =  dyad(xs,v)  in  x’::xs’; 

Using  the  metric  M*  that  counts  multiplications,  multivariate  resource  analysis 
for  sequential  programs  derives  the  bound  |u|-|v|.  In  the  cons  branch  of  the 
pattern  match,  we  have  the  potential  |xs|-|v|  +  |v|  which  is  shared  to  pay  for  the 
cost  |v|  of  mult(x,  v)  and  the  cost  |xs|-|v|  of  dyad(xs,v). 

Moving  multivariate  potential  through  a  program  is  not  trivial;  especially  in 
the  presence  of  nested  data  structures  like  trees  of  lists.  To  give  an  idea  of  the 
challenges,  consider  the  expression  e  that  is  dehned  as  follows, 
let  xs  =  mult (496, ys)  in 
let  zs  =  append(ys,ys)  in  dyad(xs,zs) 

The  depth  of  evaluating  e  in  the  metric  M*  is  bounded  by  |ys|  +  2|ysp.  Like 
in  the  previous  example,  we  express  this  in  amortized  resource  analysis  with 
the  initial  potential  |ys|  +  2|ysp.  This  potential  has  to  be  shared  to  pay  for  the 
cost  of  the  evaluations  of  mult(496,ys)  (namely  |ys|)  and  dyad(xs,zs)  (namely 
2|ysp).  However,  the  type  of  dyad  requires  the  quadratic  potential  |xs|-|zs|.  In 
this  simple  example,  it  is  easy  to  see  that  |xs|-|zs|  =  2|ysp.  But  in  general,  it  is 
not  straightforward  to  compute  such  a  conversion  of  potential  in  an  automatic 
analysis  system,  especially  for  nested  data  structures  and  super-linear  size  changes. 
The  type  inference  for  multivariate  amortized  resource  analysis  for  sequential 
programs  can  analyze  such  programs  efficiently  [7]. 

Parallel  Composition.  The  insight  of  this  paper  is  that  the  potential  method 
works  also  well  to  derive  bounds  on  parallel  evaluations.  The  main  challenge  in 
the  development  of  an  amortized  resource  analysis  for  parallel  evaluations  is  to 
ensure  the  same  compositionality  as  in  sequential  amortized  resource  analysis. 

The  basic  idea  of  our  new  analysis  system  is  to  allow  each  branch  in  a  parallel 
evaluation  to  use  all  the  available  potential  without  sharing.  Consider  for  example 
the  previously  defined  function  mult2  in  which  we  evaluate  the  two  applications 
of  mult  in  parallel. 

mult2par(ys)  =  par  xs  =  mult (496, ys) 

and  zs  =  mult (8128, ys)  in  (xs,zs) 

Since  the  depth  of  mult(n,ys)  is  |ys|  for  every  n  and  the  two  applications  of  mult 
are  evaluated  in  parallel,  the  depth  of  the  evaluation  of  mult2par(ys)  is  |ys|  in  the 
metric  M*. 

In  the  type  judgement,  we  type  the  two  function  applications  of  mult  as  in 
the  sequential  case  in  which 

x:int,ys:L(int);Q  |  mult(x,  ys)  :  (L(int),(50 
such  that  pQ{n,a)  =  |a|  and  PQ>{a)  =  0.  In  the  type  judgement 

ys:L(int);i?  |  mult2par(ys)  :  (L(int)  *  L(int),  i?') 

for  mult2par  we  require  however  only  that  PR{a)  ^  pQ^a).  In  this  way,  we  express 
that  the  initial  potential  defined  by  the  coefficients  R  has  to  be  sufficient  to 
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cover  the  cost  of  each  parallel  branch.  Consequently,  a  possible  instantiation  of 
R  corresponds  to  the  potential  function  pR^a)  =  |a|. 

In  the  function  dyad,  we  can  replace  the  sequential  computation  of  the  inner 
lists  of  the  result  by  a  parallel  computation  in  which  we  perform  all  calls  to  the 
function  mult  in  parallel.  The  resulting  function  is  dyad_par. 

dyad_par  (u,v)  =  match  u  with  I  nil  — >  nil 

I  (x: :xs)  — >  par  x’  =  mult(x,v) 

and  xs’  =  dyad_par (xs ,v)  in  x’::xs’; 

The  depth  of  dyad_par  is  |v|.  In  the  type-based  amortized  analysis,  we  hence  start 
with  the  initial  potential  |v|.  In  the  cons  branch  of  the  pattern  match,  we  can 
use  the  initial  potential  to  pay  for  both,  the  cost  |v|  of  mult(x,  v)  and  the  cost  |v| 
of  the  recursive  call  dyad(xs,  v)  without  sharing  the  initial  potential. 

Unfortunately,  the  compositionality  of  the  sequential  system  is  not  preserved 
by  this  simple  idea.  The  problem  is  that  the  naive  reuse  of  potential  that  is 
passed  through  parallel  branches  would  break  the  soundness  of  the  system.  To 
see  why,  consider  the  following  function. 

mult4(ys)  =  par  xs  =  mult (496, ys) 

and  zs  =  mult (8128, ys)  in  (mult(5,xs),  mult(10,zs)) 

Recall,  that  a  valid  typing  for  xs  =  mult(496,ys)  could  take  the  initial  potential 
2|ys|  and  assign  the  potential  |xs|  to  the  result.  If  we  would  simply  reuse  the 
potential  2|ys|  to  type  the  second  application  of  mult  in  the  same  way  then  we 
would  have  the  potential  |xs|  4-  |zs|  after  the  parallel  branches.  This  potential 
could  then  be  used  to  pay  for  the  cost  of  the  remaining  two  applications  of  mult. 
We  have  now  verified  the  unsound  bound  2|ys|  on  the  depth  of  the  evaluation  of 
the  expression  mult4(ys)  but  the  depth  of  the  evaluation  is  3|ys|. 

The  problem  in  the  previous  reasoning  is  that  we  doubled  the  part  of  the 
initial  potential  that  we  passed  on  for  later  use  in  the  two  parallel  branches  of 
the  parallel  composition.  To  fix  this  problem,  we  need  a  separate  analysis  of  the 
sizes  of  data  structures  and  the  cost  of  parallel  evaluations. 

In  this  paper,  we  propose  to  use  cost-free  type  judgements  to  reason  about 
the  size  changes  in  parallel  branches.  Instead  of  simply  using  the  initial  potential 
in  both  parallel  branches,  we  share  the  potential  between  the  two  branches  but 
analyze  the  two  branches  twice.  In  the  first  analysis,  we  only  pay  for  the  resource 
consumption  of  the  first  branch.  In  the  second,  analysis  we  only  pay  for  resource 
consumption  of  the  second  branch. 

A  cost-free  type  judgement  is  like  any  other  type  judgement  in  amortized 
resource  analysis  but  uses  the  cost-free  metric  cf  that  assigns  zero  cost  to  every 
evaluation  step.  For  example,  a  cost-free  typing  of  the  function  mult(ys)  would 
express  that  the  initial  potential  can  be  passed  to  the  result  of  the  function.  In 
the  cost-free  typing  judgement 

x:int,ys:L(int);  Q  mult(x,  ys)  :  (L(int),(5') 
a  valid  instantiation  of  Q  and  Q'  would  correspond  to  the  potential 
PQ{n,a)  =  \a\  and  pQ>{a)  =  \a\. 

The  intuitive  meaning  is  that  in  a  call  zs  =  mult(x,  ys),  the  initial  potential  |ys| 
can  be  transformed  to  the  potential  |2;s|  of  the  result. 
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Using  cost-free  typings,  we  can  now  correctly  reason  about  the  depth  of  the 
evaluation  of  mult4.  We  start  with  the  initial  potential  3|ys|  and  have  to  consider 
two  cases  in  the  parallel  binding.  In  the  hrst  case,  we  have  to  pay  only  for  resource 
cost  of  mult(496,  ys).  So  we  share  the  initial  potential  and  use  2|ys|:  |ys|  to  pay 
the  cost  of  mult(496,ys)  and  |ys|  to  assign  the  potential  |xs|  to  the  result  of  the 
application.  The  reminder  |ys|  of  the  initial  potential  is  used  in  a  cost- free  typing 
of  mult(8128,  ys)  where  we  assign  the  potential  |zs|  to  the  result  of  the  function 
without  paying  any  evaluation  cost.  In  the  second  case,  we  derive  a  similar  typing 
in  which  the  roles  of  the  two  function  calls  are  switched.  In  both  cases,  we  start 
with  the  potential  3|ys|  and  end  with  the  potential  |xs|  -f  |zs|.  We  use  it  to  pay 
for  the  two  remaining  calls  of  mult  and  have  verified  the  correct  bound. 

In  the  univariate  case,  using  the  notation  from  [3, 19],  we  could  formulate 
the  type  rule  for  parallel  composition  as  follows.  Here,  the  coefficients  Q  are 
not  globally  attached  to  a  type  or  context  but  appear  locally  at  list  types  such 
as  L®(int).  The  sharing  operator  F  y  (Ji,  12,13)  requires  the  sharing  of  the 
potential  in  the  context  F  in  the  contexts  Fi,F2  and  F^.  For  instance,  we  have 
a;:L®(int)  Y (x:T^(int), a;:L^(int), x:L^(int)). 


Ty(Z\i,T2,r')  FYiri,A2,F')  Fi^ei-.Ai  62:^2 

4\i  ei  :  Ai  F2  62  :  A2  T', cciiHi, ^2:^2  e  :  B 
F  par  xi  =  Cl  and  a;2  =  62  in  e  :  H 

In  the  rule,  the  initial  potential  F  is  shared  twice  using  the  sharing  operator  Y 
First,  to  pay  the  cost  of  evaluating  62  and  e,  and  to  pass  potential  to  Xi  using  the 
cost-free  type  judgement  Z\i  ei  :  Ai.  Second,  to  pay  the  cost  of  evaluation 
ei  and  e,  and  to  pass  potential  to  X2  via  the  judgement  Z\2  62  :  A2. 

This  work  generalizes  the  idea  to  multivariate  resource  polynomials  for  which 
we  also  have  to  deal  with  mixed  potential  such  as  |a;i|-|a;2|.  The  approach  features 
the  same  compositionality  as  the  sequential  version  of  the  analysis.  As  the 
experiments  in  Section  7  show,  the  analysis  works  well  for  many  typical  examples. 

The  use  of  cost-free  typings  to  separate  the  reasoning  about  size  changes  of 
data  structures  and  resource  cost  in  amortized  analysis  has  applications  that  go 
beyond  parallel  evaluations.  Similar  problems  arise  in  sequential  (and  parallel) 
programs  when  deriving  bounds  for  non-additive  cost  such  as  stack-space  usage 
or  recursion  depth.  We  envision  that  the  developed  technique  can  be  used  to 
derive  bounds  for  these  cost  measures  too. 

Other  Forms  of  Parallelism.  The  binary  parallel  binding  is  a  simple  yet 
powerful  form  of  parallelism.  However,  it  is  (for  example)  not  possible  to  directly 
implement  NESL’s  model  of  sequences  that  allows  to  perform  an  operation  for 
every  element  in  the  sequence  in  constant  depth.  The  reason  is  that  the  parallel 
binding  would  introduce  a  linear  overhead. 

Nevertheless  it  is  possible  to  introduce  another  binary  parallel  binding  that  is 
semantically  equivalent  except  that  it  has  zero  depth  cost.  We  can  then  analyze 
more  powerful  parallelism  primitives  by  translating  them  into  code  that  uses  this 
cost-free  parallel  binding.  To  demonstrate  such  a  translation,  we  implemented 
NESL’s  [15]  parallel  sequence  comprehensions  in  RAML  (see  Section  6). 
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4  Resource  Polynomials  and  Annotated  Types 


In  this  section,  we  introduce  multivariate  resource  polynomials  and  annotated 
types.  Our  goal  is  to  systematically  describe  the  potential  functions  that  map  data 
structures  to  non-negative  rational  numbers.  Multivariate  resource  polynomials 
are  a  generalization  of  non-negative  linear  combinations  of  binomial  coefficients. 
They  have  properties  that  make  them  ideal  for  the  generation  of  succinct  linear 
constraint  systems  in  an  automatic  amortized  analysis.  The  presentation  might 
appear  quite  low  level  but  this  level  of  detail  is  necessary  to  describe  the  linear 
constraints  in  the  type  rules. 

Two  main  advantages  of  resource  polynomials  are  that  they  can  express  more 
precise  bounds  than  non-negative  linear-combinations  of  standard  polynomials 
and  that  they  can  succinctly  describe  common  size  changes  of  data  that  appear 
in  construction  and  destruction  of  data.  More  explanations  can  be  found  in  the 
previous  literature  on  multivariate  amortized  resource  analysis  [13,  7]. 

4.1  Resource  Polynomials 

A  resource  polynomial  maps  a  value  of  some  data  type  to  a  nonnegative  ratio¬ 
nal  number.  Potential  functions  and  thus  resource  bounds  are  always  resource 
polynomials. 

Base  Polynomials.  For  each  data  type  A  we  first  define  a  set  F(A)  of  functions 
p  :  |A]  — >  N  that  map  values  of  type  A  to  natural  numbers.  These  base  polynomials 
form  a  basis  (in  the  sense  of  linear  algebra)  of  the  resource  polynomials  for  type 
A.  The  resource  polynomials  for  type  A  are  then  given  as  nonnegative  rational 
linear  combinations  of  the  base  polynomials.  We  define  P{A)  as  follows. 

P(int)  =  {a  1-^  1}  P{Ai  *  A2)  =  {(oi,  02)  Pi{ai)  ■  ^2(02)  I  Pi  e  P(A)} 
P{L{A))  =  {AiT[pi, . . .  ,pk]  I  fc  G  N,p,  G  P{A)} 

We  have  AiT[pi, . . .  ,pfc]([ai, . . . ,  a„])  =  ■  Every 

set  P{A)  contains  the  constant  function  u  1— >  1.  For  lists  L{A)  this  arises  for 
fc  =  0  (one  element  sum,  empty  product). 

For  example,  the  function  £  1— >  is  in  P{L{A))  for  every  fc  G  N;  simply  take 

Pi  =  . . .  =  Pfe  =  1  in  the  definition  of  P{L{A)).  The  function  (£1,^2)  fej') 

is  in  P{L{A)  *  L{B))  for  every  ki,k2  G  N  and  [£i,  ...,£„]  1-^  (fc)')  • 

G  P{L{L{A)))  for  every  fci,  ^2  e  N. 

Resource  Polynomials.  A  resource  polynomial  p  :  |A]  — >  Q)]"  for  a  data  type  A 
is  a  non-negative  linear  combination  of  base  polynomials,  i.e.,  p  =  X;i=i  mtli'Pi 
for  Qi  G  QJ  and  pi  G  P{A).  R{A)  is  the  set  of  resource  polynomials  for  A. 

An  instructive,  but  not  exhaustive,  example  is  given  by  =  i?(L(int)  *  •  •  •  * 
L(int)).  The  set  is  the  set  of  linear  combinations  of  products  of  binomial 
coefficients  over  variables  xi, . . .  ,Xn,  that  is,  (fei* )  I  9*  ^ 

QJ,  m  G  N,  kij  G  N}.  Concrete  examples  that  illustrate  the  definitions  follow  in 
the  next  subsection. 
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4.2  Annotated  Types 

To  relate  type  annotations  in  the  type  system  to  resource  polynomials,  we 
introduce  names  (or  indices)  for  base  polynomials.  These  names  are  also  helpful 
to  intuitively  explain  the  base  polynomials  of  a  given  type. 

Names  For  Base  Polynomials.  To  assign  a  unique  name  to  each  base  polyno¬ 
mial  we  define  the  index  set  I{A)  to  denote  resource  polynomials  for  a  given  data 
type  A.  Essentially,  T(A)  is  the  meaning  of  A  with  every  atomic  type  replaced 
by  the  unit  index  o. 

I(int)  =  {o}  I{Ai  *  A2)  =  {(11,12)  I  *1  e2:(Ai)  and  Z2  £1(^2)} 
X(T(A))  =  {[zi,...,zfe]  I  fc^0,z,  eI(A)} 

The  degree  deg(z)  of  an  index  i  G  I{A)  is  defined  as  follows. 

deg(o)  =  0  deg(zi,Z2)  =  deg(zi)  -t  deg(z2) 

deg([zi, . . . ,  Zfc])  =  k  +  deg(zi)  H - h  deg(zfe) 

Let  Ik{A)  =  {z  G  X{A)  I  deg(z)  ^  k).  The  indices  z  G  Xk{A)  are  an  enumeration 
of  the  base  polyonomials  pi  G  P{A)  of  degree  at  most  k.  For  each  i  G  T(A),  we 
define  a  base  polynomial  pi  G  P{A)  as  follows:  If  A  =  int  then  Po{v)  =  1  .  If 
A  =  (Ai  *  A2)  is  a  pair  type  and  v  =  ivi,V2)  then  =  Pi^ivi)  ■pi^{v2).  If 

A  =  L{B)  is  a  list  type  and  v  G  |L(S)]  then  (u)  =  SII[pi^, . . .  ,pi^]{v). 

We  use  the  notation  0^  (or  just  0)  for  the  index  in  X{A)  such  that  Poa(®)  =  1 
all  a.  We  have  0i„t  =  o  and  0(Ai*A2)  =  (0ai,0^J  and  0^(3)  =  [].  If  A  =  L{B) 
for  a  data  type  B  then  the  index  [0, . . . ,  0]  G  1(A)  of  length  n  is  denoted  by  just 
n.  We  identify  the  index  (ziA2,*3)*4)  with  the  index  (zi,  (z2,  (z3,Z4))). 
Examples.  First  consider  the  type  int.  The  index  set  X(int)  =  {0}  only  contains 
the  unit  element  because  the  only  base  polynomial  for  the  type  int  is  the  constant 
polynomial  po  :  Z  — >  N  that  maps  every  integer  to  1,  that  is,  Po(zz)  =  1  for  all 
zz  G  Z.  In  terms  of  resource-cost  analysis  this  implies  that  the  resource  polynomials 
can  not  represent  cost  that  depends  on  the  value  of  an  integer. 

Now  consider  the  type  L(int).  The  index  set  for  lists  of  integers  is  X(L(int))  = 
{[],  [o],  [o,  o], . . .},  the  set  of  lists  of  unit  indices  o.  The  base  polynomial  pq  : 
|L(int)]  — >  N  is  defined  as  p[]([ai, . . . ,««])  =  1  (one  element  sum  and  empty 
product).  More  interestingly,  we  have  p[o]([ai, . . .  ,a„])  =  Xiisgjsgn  1  =  ^  and 
P[o,o]([ai, . . .  ,a„])  =  1  =  (2)-  In  general,  if  4  =  [o, . . .  ,0]  is  as  list 

with  k  unit  indices  thenpi^([ai, . . . ,  a„])  =  1  =  (fc)-  The  intuition 

is  that  the  base  polynomial  pi^([ai, . . . ,  a„])  describes  a  constant  resource  cost 
that  arises  for  every  ordered  /c-tuple  (0^4 , . . . ,  ) . 

Finally,  consider  the  type  L(L(int))  of  lists  of  lists  of  integers.  The  corre¬ 
sponding  index  set  is  X(L(L(int)))  =  I  T(L(int))}  u  {[zi,  Z2]  |  zi  ,*2  e 

X{L{int))}  u  •  •  • .  Again  we  have  pg  :  |L(L(int))]  — >  N  and  p[]([ai, . . . ,  a„])  =  1. 
Moreover  we  also  get  the  binomial  coefficients  again:  If  the  index  ik  =  [[],...,  []] 
is  as  list  of  k  empty  lists  then  Pi^{[ai,  ■  ■  ■  ,an])  =  1  =  it)  -  This 

describes  a  cost  that  would  arise  in  a  program  that  computes  something  of  con¬ 
stant  cost  for  tuples  of  inner  lists  (e.g.,  sorting  with  respect  to  the  smallest  head 
elements).  However,  the  base  polynomials  can  also  refer  to  the  lengths  of  the  inner 
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lists.  For  instance,  we  have  p[[o,  o]]([ai, . . . ,  a„])  =  Xiissisgn  (''2 ')>  which  repre¬ 
sents  a  quadratic  cost  for  every  inner  list  (e.g,  sorting  the  inner  lists).  This  is  not 
to  be  confused  with  the  base  polynomial  P[o,o]([aii  •  •  ■  jOn])  =  Yji<i<j<n 
which  can  be  used  to  account  for  the  cost  of  the  comparisons  in  a  lexicographic 
sorting  of  the  outer  list. 

Annotated  Types  and  Potential  Functions.  We  use  the  indices  and  base 
polynomials  to  define  type  annotations  and  resource  polynomials.  We  then  give 
examples  to  illustrate  the  definitions. 

A  type  annotation  for  a  data  type  A  is  defined  to  be  a  family 

Qa  =  {qt)ieX(A)  with  q,  G  Qj 

We  say  Qa  is  of  degree  (at  most)  k  ii  qi  =  0  for  every  i  G  I{A)  with  deg(f)  >  k. 
An  annotated  data  type  is  a  pair  {A,  Qa)  of  a  data  type  A  and  a  type  annotation 
Qa  of  some  degree  k. 

Let  H  he  a  heap  and  let  t'  be  a  location  with  H  \=  £i-^a :  A  for  a  data 
type  A.  Then  the  type  annotation  Qa  defines  the  potential  ‘1>h{£'-{A,Qa))  = 
2i€i(A) ■  Pi  (®)  •  If  a  £  1^1  and  Q  is  a  type  annotation  for  A  then  we  also  write 
<P{a  :  (A,Q))  for 

Let  for  example,  Q  =  (9i)i€L(int)  bo  an  annotation  for  the  type  L(int)  and 
let  g[]  =  2,  (7[o]  =  2.5,  (Z[o,o,o]  =  8,  and  qi  =  0  for  all  other  i  G  Z(L(int)).  The  we 
have  d>{[ai, . . . ,  a„]  :  (L(int),  Q))  =  2  +  2.5n  -I-  8(3). 

The  Potential  of  a  Context.  For  use  in  the  type  system  we  need  to  extend 
the  definition  of  resource  polynomials  to  typing  contexts.  We  treat  a  context  like 
a  tuple  type.  Let  F  =  xi'.Ai, . . . ,  Xn'-An  be  a  typing  context  and  let  fc  G  N.  The 
index  set  F{r)  is  defined  through  T{r)  =  {(fi, . . . ,  i„)  |  G  I{Aj)}. 

The  degree  of  f  =  (ii, . . .  ,in)  e  T(F)  is  defined  through  deg(i)  =  deg(fi)  -I- 
•  •  •  -t  deg(f„).  As  for  data  types,  we  define  Ik{r)  =  {f  g  I{r)  \  deg(z)  ^  fc}.  A 
type  annotation  Q  for  F  is  a  family  Q  =  {qi)iei^.(r)  with  qi  G  Qj.  We  denote  a 
resource-annotated  context  with  F ;  Q.  Let  iL  be  a  heap  and  F  be  a  stack  with 
H  \=V  ■.  F  where  F[  \=  V{xj)^axj  ■  F{xj) . 

The  potential  of  an  annotated  context  F ;  Q  with  respect  to  then  environment 
H  and  V  is  <Pv,h{F;Q)  =  In  particular,  if  F  = 

0  thenFfc(F)  =  {()}  and  <l>v,H{F]q(^)  =  g().  We  sometimes  also  write  go  for  g(). 

5  Type  System  for  Bounds  on  the  Depth 

In  this  section,  we  formally  describe  the  novel  resource-aware  type  system.  We 
focus  on  the  type  judgement  and  explain  the  rules  that  are  most  important  for 
handling  parallel  evaluation.  The  full  type  system  is  given  in  the  extended  version 
of  this  article  [17]. 

The  main  theorem  of  this  section  proves  the  soundness  of  the  type  system 
with  respect  to  the  depths  of  evaluations  as  defined  by  the  operational  big-step 
semantics.  The  soundness  holds  for  terminating  and  non-terminating  evaluations. 
Type  Judgments.  The  typing  rules  in  Figure  2  define  a  resource-annotated 
typing  judgment  of  the  form 

F;F;{gi,...,g„}  e:(A,g') 
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where  M  is  a  metric,  ne  {1,2},  e  is  an  expression,  is  a  resource- annotated 
signature  (see  below),  is  a  resource-annotated  context  for  every  i  e 

(1, . . . ,  n},  and  {A,  Q')  is  a  resource-annotated  data  type.  The  intended  meaning 
of  this  judgment  is  the  following.  If  there  are  more  than  ;  Qi)  resource  units 
available  for  every  z  e  {1, . . . ,  n}  then  this  is  sufficient  to  pay  for  the  depth  of  the 
evaluation  of  e  under  the  metric  M.  In  addition,  there  are  more  than  <P{v:{A,  Q')) 
resource  units  left  if  e  evaluates  to  a  value  v. 

In  outermost  judgements,  we  are  only  interested  in  the  case  where  rz  =  I  and 
the  judgement  is  equivalent  to  the  similar  judgement  for  sequential  programs  [7]. 
The  form  in  which  rz  =  2  is  introduced  in  the  type  rule  E:Par  for  parallel 
bindings  and  eliminated  by  multiple  applications  of  the  sharing  rule  E:Share 
(more  explanations  follow). 

The  type  judgement  is  affine  in  the  sense  that  every  variable  in  a  context 
r  can  be  used  at  most  once  in  the  expression  e.  Of  course,  we  have  to  also 
deal  with  expressions  in  which  a  variable  occurs  more  than  once.  To  account  for 
multiple  variable  uses  we  use  the  sharing  rule  T;  Share  that  doubles  a  variable 
in  a  context  without  increasing  the  potential  of  the  context. 

As  usual  A,72  denotes  the  union  of  the  contexts  A  and  A  provided  that 
dom(ri)  n  dom(A)  =  0-  We  thus  have  the  implicit  side  condition  dom(A)  n 
dom(A)  =  0  whenever  Fi,  A  occurs  in  a  typing  rule.  Especially,  writing  F  = 
Xi'.Ai, . . . ,  Xk'A^.  means  that  the  variables  Xi  are  pairwise  distinct. 

Programs  with  Annotated  Types.  Resource- annotated  first-order  types  have 
the  form  {A,Q)  {B,Q')  for  annotated  data  types  {A,Q)  and  {B,Q').  A 

resource-annotated  signature  A  is  a  hnite,  partial  mapping  of  function  identi- 
hers  to  sets  of  resource-annotated  hrst-order  types.  A  program  with  resource- 
annotated  types  for  the  metric  M  consists  of  a  resource-annotated  signature  S 
and  a  family  of  expressions  with  variables  identifiers  (e/,  j//)/gdom(2:)  such  that 
S]yf:A]Q  e/  :  {B,Q')  for  every  function  type  {A,Q)  ->  {B,Q')  e  B{f). 
Sharing.  Let  F,xi'.A,X2'-A]Q  be  an  annotated  context.  The  sharing  operation 
y  Q  defines  an  annotation  for  a  context  of  the  form  A  x:A.  It  is  used  when  the 
potential  is  split  between  multiple  occurrences  of  a  variable.  Details  can  be  found 
in  the  full  version  of  the  article. 

Typing  Rules.  Figure  2  shows  the  annotated  typing  rules  that  are  most 
relevant  for  parallel  evaluation.  Most  of  the  other  rules  are  similar  to  the  rules 
for  multivariate  amortized  analysis  for  sequential  programs  [13,20].  The  main 
difference  it  that  the  rules  here  operate  on  annotations  that  are  singleton  sets 
{Q}  instead  of  the  usual  context  annotations  Q. 

In  the  rules  T:Let  and  T:Par,  the  result  of  the  evaluation  of  an  expression  e 
is  bound  to  a  variable  x.  The  problem  that  arises  is  that  the  resulting  annotated 
context  Z\,  x:  A,  Q'  features  potential  functions  whose  domain  consists  of  data 
that  is  referenced  by  x  as  well  as  data  that  is  referenced  by  A.  This  potential 
has  to  be  related  to  data  that  is  referenced  by  A  and  the  free  variables  in  e. 

To  express  the  relations  between  mixed  potentials  before  and  after  the  evalu¬ 
ation  of  e,  we  introduce  a  new  auxiliary  binding  judgement  of  the  from 
B;F,A;Q  e^A,x:A;Q' 
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A,  A;  R\^  ei^  A,  x:A-,  R' 
r; ,  A,  x:A;  {R'j  ea  :  (B,  Q')  Q  =  R  + 

X';A,A;{Q}H^  let*  =  ei  in  62  :  (-B,  Q') 


(T:Let) 


S-,n,r2,A;P  ei  ^  A,/\,*i:Ai;A 

A,/\,*i:Ai;  A  ea  A,  a;i:Ai,  *2:^2;  i?  (T:Par) 

If;  A,  A,*i:Ai;Q'  ea '-J  A,  *i:Ai,  *2:^2; -R 
If;  A,  A,  A;  Q  H^ei  ^  A,A,*i:Ai;Q'  T;  A,  *i:Ai,  *2:^2;  R  H^e:  (R,R') 

If;  A,  A,  A;  {<5  +  P  +  parxi  =  ei  and  *2  =  ea  in  e  :  (B,  R') 


E-,r,xv.A,x2-.A-{P^,...,P^}  e:(B,Q')  VAj:Qj  =  YA 

If;  A  *:-4;  {Qi, . . . ,  Q„}  e[*/*i,  x/xa]  :  {B,Q') 


(T:Share) 


VieR(A):  i=0 

jAO  = 


AAa  (Q) 


Ej-,r-,-n:]  (Q) 


laL  e:  (A,7rf-^(Q')) 


E-,r,A-Q  A,*;A;Q' 


e  :  (A,7rf'^(Q')) 

(B:Bind) 


Fig.  2.  Selected  novel  typing  rules  for  annotated  types  and  the  binding  rule  for  multi¬ 
variate  variable  binding. 


in  the  rule  B:BiND.  The  intuitive  meaning  of  the  judgement  is  the  following. 
Assume  that  e  is  evaluated  in  the  context  r,A,  that  FV(e)  G  dom(T),  and 
that  e  evaluates  to  a  value  that  is  bound  to  the  variable  x.  Then  the  initial 
potential  l?(A  A;  Q)  is  larger  than  the  cost  of  evaluating  e  in  the  metric  M  plus 
the  potential  of  the  resulting  context  <l>{A,x:A]  Q'). 

The  rule  T :Par  for  parallel  bindings  par  *1  =  Ci  and  *2  =  62  in  e  is  the  main 
novelty  in  the  type  system.  The  idea  is  that  we  type  the  expressions  ei  and 
62  twice  using  the  new  binding  judgement.  In  the  first  group  of  bindings,  we 
account  for  the  cost  of  ei  and  derive  a  context  r2,  A,xi:Ai',  P[  in  which  the 
result  of  the  evaluation  of  ei  is  bound  to  xi.  This  context  is  then  used  to  bind 
the  result  of  evaluating  62  in  the  context  A,  xi:Ai,  ^2:^2;  R  without  paying  for 
the  resource  consumption.  In  the  second  group  of  bindings,  we  also  derive  the 
context  A,  xi:Ai,  *2:^2;  R  but  pay  for  the  cost  of  evaluating  62  instead  of  Ci. 
The  type  annotations  Qi  and  Q2  for  the  initial  context  R  =  A,  R2)  A  establish 
a  bound  on  the  depth  d  of  evaluating  the  whole  parallel  binding:  If  the  depth 
of  evaluating  ei  is  larger  than  the  depth  of  evaluating  62  then  ^(R;Qi)  >  d. 
Otherwise  we  have  R(R ;  Q2)  ^  d.  li  the  parallel  binding  evaluates  to  a  value  v 
then  we  have  additionally  that  max(<?(R ;  Qi),  R(R ;  Q2))  ^  d  +  <P{v:{B,  Q')). 

It  is  important  that  the  annotations  Qi  and  Q2  of  the  initial  context  A,  R2)  A 
can  defer.  The  reason  is  that  we  have  to  allow  a  different  sharing  of  potential  in 
the  two  groups  of  bindings.  If  we  would  require  Qi  =  Q2  then  the  system  would 
be  too  restrictive.  However,  each  type  derivation  has  to  establish  the  equality 
of  the  two  annotations  directly  after  the  use  of  T:Par  by  multiple  uses  of  the 
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sharing  rule  T:Share.  Note  that  T:Par  is  the  only  rule  that  can  introduce  a 
non-singleton  set  {Qi,Qn}  of  context  annotations. 

T :  Share  has  to  be  applied  to  expressions  that  contain  a  variable  twice  {x  in 
the  rule).  The  sharing  operation  yP  transfers  the  annotation  P  for  the  context 
P,  xi'.A,  X2'A  into  an  annotation  Q  for  the  context  P,  x:A  without  loss  of  potential 
.  This  is  crucial  for  the  accuracy  of  the  analysis  since  instances  of  T:  Share  are 
quite  frequent  in  typical  examples.  The  remaining  rules  are  affine  in  the  sense 
that  they  assume  that  every  variable  occurs  at  most  once  in  the  typed  expression. 

T:Share  is  the  only  rule  whose  premiss  allows  judgements  that  contain  a 
non-singleton  set  {Pi, . . .  ,Pm}  of  context  annotations.  It  has  to  be  applied  to 
produce  a  judgement  with  singleton  set  {Q}  before  any  of  the  other  rules  can  be 
applied.  The  idea  is  that  we  always  have  n  ^  m  for  the  set  {Qi,  ■  ■  ■ ,  Qn}  and  the 
sharing  operation  Yi  is  used  to  unify  the  different  Pi. 

Soundness.  The  operational  big-step  semantics  with  partial  evaluations  makes 
it  possible  to  state  and  prove  a  strong  soundness  result.  An  annotated  type 
judgment  for  an  expression  e  establishes  a  bound  on  the  depth  of  all  evaluations 
of  e  in  a  well-formed  environment;  regardless  of  whether  these  evaluations  diverge 
or  fail. Moreover,  the  soundness  theorem  states  also  a  stronger  property  for 
terminating  evaluations.  If  an  expression  e  evaluates  to  a  value  i;  in  a  well-formed 
environment  then  the  difference  between  initial  and  final  potential  is  an  upper 
bound  on  the  depth  of  the  evaluation. 

Theorem  3  (Soundness).  If  H  \=  V'.P  and  S;r]Q  I—  e:{B,Q')  then  there 
exists  a  Q  e  Q  such  that  the  following  holds. 

1.  IfV,H\^  e^{e,H')\{w,d)  thend^‘Iv,H{r]Q)-^H'{t{B,Q')). 

2.  If  V,  H  e  {{  p  I  (w,  d)  then  d  ^  ’I’v.nir ;  Q). 

Theorem  3  is  proved  by  a  nested  induction  on  the  derivation  of  the  evaluation 
judgment  and  the  type  judgment  P;  Q  P  e:(P,  Q').  The  inner  induction  on  the 
type  judgment  is  needed  because  of  the  structural  rules.  There  is  one  proof  for 
all  possible  instantiations  of  the  resource  constants. 

The  proof  of  most  rules  is  very  similar  to  the  proof  of  the  rules  for  multivariate 
resource  analysis  for  sequential  programs  [7].  The  main  novelty  is  the  treatment 
of  parallel  evaluation  in  the  rule  T:Par  which  we  described  previously. 

If  the  metric  M  is  simple  (all  constants  are  1)  then  it  follows  from  Theorem 
3  that  the  bounds  on  the  depth  also  prove  the  termination  of  programs. 

Corollary  1.  Let  M  he  a  simple  metric.  If  H  \=  V'.P  and  B;r;Q  \-  e:{A,Q') 
then  there  are  w  e  N  and  d  ^  d>v,H{r ;  Q)  such  that  V,  H  e  {{  {£,  H'  )  I  {w,d) 
for  some  I  and  H' . 

Type  Inference.  In  principle,  type  inference  consists  of  four  steps.  First,  we 
perform  a  classic  type  inference  for  the  simple  types  such  as  nat  array.  Second, 
we  fix  a  maximal  degree  of  the  bounds  and  annotate  all  types  in  the  derivation  of 
the  simple  types  with  variables  that  correspond  to  type  annotations  for  resource 
polynomials  of  that  degree.  Third,  we  generate  a  set  of  linear  inequalities,  which 
express  the  relationships  between  the  added  annotation  variables  as  specified  by 
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the  type  rules.  Forth,  we  solve  the  inequalities  with  an  LP  solver  such  as  CLP. 
A  solution  of  the  linear  program  corresponds  to  a  type  derivation  in  which  the 
variables  in  the  type  annotations  are  instantiated  according  to  the  solution. 

In  practice,  the  type  inference  is  slightly  more  complex.  Most  importantly, 
we  have  to  deal  with  resource-polymorphic  recursion  in  many  examples.  This 
means  that  we  need  a  type  annotation  in  the  recursive  call  that  differs  from  the 
annotation  in  the  argument  and  result  types  of  the  function.  To  infer  such  types 
we  successively  infer  type  annotations  of  higher  and  higher  degree.  Details  can  be 
found  in  previous  work  [21].  Moreover,  we  have  to  use  algorithmic  versions  of  the 
type  rules  in  the  inference  in  which  the  non-syntax-directed  rules  are  integrated 
into  the  syntax-directed  ones  [7].  Finally,  we  use  several  optimizations  to  reduce 
the  number  of  generated  constraints.  See  [7]  for  an  example  type  derivation. 

6  Nested  Data  Parallelism 

The  techniques  that  we  describe  in  this  work  for  a  minimal  function  language 
scale  to  more  advanced  parallel  languages  such  as  Blelloch’s  NESL  [15]. 

To  describe  the  novel  type  analysis  in  this  paper,  we  use  a  binary  binding 
construct  to  introduce  parallelism.  In  NESL,  parallelism  is  introduced  via  built-in 
functions  on  sequences  as  well  as  parallel  sequence  comprehension  that  is  similar 
to  Haskell’s  list  comprehension.  The  depth  of  all  built-in  sequence  functions  such 
as  append  and  sum  is  constant  in  NESL.  Similarly,  the  depth  overhead  of  the 
parallel  sequence  comprehension  is  constant  too.  Of  course,  it  is  possible  to  define 
equivalent  functions  in  RAML.  However,  the  depth  would  often  be  linear  since 
we,  for  instance,  have  to  sequentially  form  the  resulting  list. 

Nevertheless,  the  user  definable  resource  metrics  in  RAML  make  it  easy  to 
introduce  built-in  functions  and  language  constructs  with  customized  work  and 
depth.  For  instance  we  could  implement  NESL’s  append  like  the  recursive  append 
in  RAML  but  use  a  metric  inside  the  function  body  in  which  all  evaluation  steps 
have  depth  zero.  Then  the  depth  of  the  evaluation  of  append(x,  y)  is  constant 
and  the  work  is  linear  in  jxj. 

To  demonstrate  this  ability  of  our  approach,  we  implemented  parallel  list 
comprehensions,  NESL’s  most  powerful  construct  for  parallel  computations.  A 
list  comprehension  has  the  form  {  e  :  xi  in  ei ; . . . ;  in  e„  j  e&  }.  where  e  is 
an  expression,  ei, . . . ,  e„  are  expressions  of  some  list  type,  and  Cb  is  a  boolean 
expression.  The  semantics  is  that  we  bind  xi, . . .  ,Xn  successively  to  the  elements 
of  the  lists  ei, . . . ,  e„  and  evaluate  Cb  and  e  under  these  bindings.  If  eb  evaluates 
to  true  under  a  binding  then  we  include  the  result  of  e  under  that  binding  in  the 
resulting  list.  In  other  words,  the  above  list  comprehension  is  equivalent  to  the 
Haskell  expression  [  e  j  (a;i, . . . ,  Xn)  <—  zip„  ei  . . .  e„  ,  e;,  ]. 

The  work  of  evaluating  {  e  :  xi  in  ei ; . . . ;  a;„  in  e„  j  e?,  }  is  sum  of  the  cost  of 
evaluating  ei, . . . ,  e„_i  and  e„  plus  the  sum  of  the  cost  of  evaluating  Cb  and  e 
with  the  successive  bindings  to  the  elements  of  the  results  of  the  evaluation  of 
ei, . . . ,  e„.  The  depth  of  the  evaluation  is  sum  of  the  cost  of  evaluating  ei, . . . ,  e„_i 
and  e„  plus  the  maximum  of  the  cost  of  evaluating  eb  and  e  with  the  successive 
bindings  to  the  elements  of  the  results  of  the  e^. 
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Function  Name  /  Computed  Depth  Bound  /  Run  Time  Asym.  Behav. 

Function  Type  Computed  Work  Bound 


find  12m  +  29n  +  22  0.38  s  0(m+n) 

L(int)HsL(int)  — »■  L(L(int))  20mn  +  18m  +  9n  +  16  0.41  s  0(nm) 

Table  1.  Compilation  of  Computed  Depth  and  Work  Bounds. 


7  Experimental  Evaluation 

We  implemented  the  developed  automatic  depth  analysis  in  Resource  Aware  ML 
(RAML) .  The  implementation  consists  mainly  of  adding  the  syntactic  form  for  the 
parallel  binding  and  the  parallel  list  comprehensions  together  with  the  treatment 
in  the  parser,  the  interpreter,  and  the  resource-aware  type  system.  RAML  is 
publically  available  for  download  and  through  a  user-friendly  online  interface  [16]. 
On  the  project  web  page  you  also  find  the  source  code  of  all  example  programs 
and  of  RAML  itself. 

We  used  the  implementation  to  perform  an  experimental  evaluation  of  the 
analysis  on  typical  examples  from  functional  programming.  In  the  compilation 
of  our  results  we  focus  on  examples  that  have  a  different  asymptotic  worst-case 
behavior  in  parallel  and  sequential  evaluation.  In  many  other  cases,  the  worst-case 
behavior  only  differs  in  the  constant  factors.  Also  note  that  many  of  the  classic 
examples  of  Blelloch  [10] — like  quick  sort — have  a  better  asymptotic  average 
behavior  in  parallel  evaluation  but  the  same  asymptotic  worst-case  behavior  in 
parallel  and  sequential  cost. 

Table  1  contains  a  representative  compilation  of  our  experimental  results.  For 
each  analyzed  function,  it  shows  the  function  type,  the  computed  bounds  on 
the  work  and  the  depth,  the  run  time  of  the  analysis  in  seconds  and  the  actual 
asymptotic  behavior  of  the  function.  The  experiments  were  performed  on  an  iMac 
with  a  3.4  GHz  Intel  Core  i7  and  8  GB  memory.  As  LP  solver  we  used  IBM’s 
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CPLEX  and  the  constraint  solving  takes  about  60%  of  the  overall  run  time  of  the 
prototype  on  average.  The  computed  bounds  are  simplified  multivariate  resource 
polynomials  that  are  presented  to  the  user  by  RAML.  Note  that  RAML  also 
outputs  the  (unsimplified)  multivariate  resource  polynomials.  The  variables  in 
the  computed  bounds  correspond  to  the  sizes  of  different  parts  of  the  input.  As 
naming  convention  we  use  the  order  n,  m,  x,  y,  z,  u  of  variables  to  name  the  sizes 
in  a  depth- first  way:  n  is  the  size  of  the  first  argument,  m  is  the  maximal  size  of 
the  elements  of  the  hrst  argument,  x  is  the  size  of  the  second  argument,  etc. 

All  bounds  are  asymptotically  tight  if  the  tight  bound  is  representable  by  a 
multivariate  resource  polynomial.  For  example,  the  exponential  work  bound  for 
fib  and  the  logarithmic  bounds  for  bitonic_sort  are  not  representable  as  a  resource 
polynomial.  Another  example  is  the  loose  depth  bound  for  dyad_all  where  we 
would  need  the  base  function  maxi^j^„  mi  but  only  have  Sisgisgn™*- 
Matrix  Operations.  To  study  programs  that  use  nested  data  structures  we 
implemented  several  matrix  operations  for  matrices  that  are  represented  by  lists 
of  lists  of  integers.  The  implemented  operations  include,  the  dyadic  product 
from  Section  3  (dyad),  transposition  of  matrices  (transpose,  see  [16]),  addition  of 
matrices  (m_add,  see  [16]),  and  multiplication  of  matrices  (m_multl  and  m_mult2). 

To  demonstrate  the  compositionality  of  the  analysis,  we  have  implemented 
two  more  involved  functions  for  matrices.  The  function  dyad_all  computes  the 
dyadic  product  (using  dyad)  of  all  ordered  pairs  of  the  inner  lists  in  the  argument. 
The  function  m_mult_pairs  computes  the  products  Mi  ■  M2  (using  m_multl)  of  all 
pairs  of  matrices  such  that  Mi  is  in  the  first  list  of  the  argument  and  M2  is  in 
the  second  list  of  the  argument. 

Sorting  Algorithms.  The  sorting  algorithms  that  we  implemented  include  quick 
sort  and  bitonic  sort  for  lists  of  integers  (quicksort  and  bitonic_sort,  see  [16]). 

The  analysis  computes  asymptotically  tight  quadratic  bounds  for  the  work 
and  depth  of  quick  sort.  The  asymptotically  tight  bounds  for  the  work  and  depth 
of  bitonic  sort  are  O(nlogn)  and  0{n\og^  n),  respectively,  and  can  thus  not  be 
expressed  by  polynomials.  However,  the  analysis  computes  quadratic  and  cubic 
bounds  that  are  asymptotically  optimal  if  we  only  consider  polynomial  bounds. 

More  interesting  are  sorting  algorithms  for  lists  of  lists,  where  the  comparisons 
need  linear  instead  of  constant  time.  In  these  algorithms  we  can  often  perform 
the  comparisons  in  parallel.  For  instance,  the  analysis  computes  asymptotically 
tight  bounds  for  quick  sort  for  lists  of  lists  of  integers  (quicksort_list,  see  Table  1). 
Set  Operations.  We  implemented  sets  as  unsorted  lists  without  duplicates. 
Most  list  operations  such  as  intersection  (Table  1),  difference  (see  [16]),  and 
union  (see  [16])  have  linear  depth  and  quadratic  work.  The  analysis  finds  these 
asymptotically  tight  bounds. 

The  function  product  computes  the  Cartesian  product  of  two  sets.  Work 
and  depth  of  product  are  both  linear  and  the  analysis  finds  asymptotically  tight 
bounds.  However,  the  constant  factors  in  the  parallel  evaluation  are  much  smaller. 
Miscellaneous.  The  function  max_weight  (Table  1)  computes  the  maximal  weight 
of  a  (connected)  sublist  of  an  integer  list.  The  weight  of  a  list  is  simply  the  sum 
of  its  elements.  The  work  of  the  algorithm  is  quadratic  but  the  depth  is  linear. 
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Finally,  there  is  a  large  class  of  programs  that  have  non-polynomial  work 
but  polynomial  depth.  Since  the  analysis  can  only  compute  polynomial  bounds 
we  can  only  derive  bounds  on  the  depth  for  such  programs.  A  simple  example 
in  Table  1  is  the  function  fib  that  computes  the  Fibonacci  numbers  without 
memoization. 

Parallel  List  Comprehensions.  The  aforementioned  examples  are  all  imple¬ 
mented  without  using  parallel  list  comprehensions.  Parallel  list  comprehensions 
have  a  better  asymptotic  behavior  than  semantically-equivalent  recursive  func¬ 
tions  in  RAML’s  current  resource  metric  for  evaluation  steps. 

A  simple  example  is  the  function  dyad_comp  which  is  equivalent  to  dyad  and 
which  is  implemented  with  the  expression  {{x  *  y  :  y  in  ys}  :  x  in  xs}.  As  listed 
in  Table  1,  the  depth  of  dyad.comp  is  constant  while  the  depth  of  dyad  is  linear. 
RAML  computes  tight  bounds. 

A  more  involved  example  is  the  function  find  that  hnds  a  given  integer  list 
(needle)  in  another  list  (haystack).  It  returns  the  starting  indices  of  each  occur¬ 
rence  of  the  needle  in  the  haystack.  The  algorithm  is  described  by  Blelloch  [15] 
and  cleverly  uses  parallel  list  comprehensions  to  perform  the  search  in  parallel. 
RAML  computes  asymptotically  tight  bounds  on  the  work  and  depth. 
Discussion.  Our  experiments  show  that  the  range  of  the  analysis  is  not  reduced 
when  deriving  bounds  on  the  depth:  The  prototype  implementation  can  always 
infer  bounds  on  the  depth  of  a  program  if  it  can  infer  bounds  on  the  sequential 
version  of  the  program.  The  derivation  of  bounds  for  parallel  programs  is  also 
almost  as  efficient  as  the  derivation  of  bounds  for  sequential  programs. 

We  experimentally  compared  the  derived  worst-case  bounds  with  the  measured 
work  and  depth  of  evaluations  with  different  inputs.  In  most  cases,  the  derived 
bounds  on  the  depth  are  asymptotically  tight  and  the  constant  factors  are  close 
or  equal  to  the  optimal  ones.  As  a  representative  example,  the  full  version  of  the 
article  contains  plots  of  our  experiments  for  quick  sort  for  lists  of  lists. 

8  Related  Work 

Automatic  amortized  resource  analysis  was  introduced  by  Hofmann  and  dost  for 
a  strict  first-order  functional  language  [3].  The  technique  has  been  applied  to 
higher-order  functional  programs  [22] ,  to  derive  stack-space  bounds  for  functional 
programs  [23],  to  functional  programs  with  lazy  evaluation  [4],  to  object-oriented 
programs  [24,  25] ,  and  to  low-level  code  by  integrating  it  with  separation  logic  [26] . 
All  the  aforementioned  amortized-analysis-based  systems  are  limited  to  linear 
bounds.  The  polynomial  potential  functions  that  we  use  in  this  paper  were 
introduced  by  Hoffmann  et  al.  [19, 13,  7].  In  contrast  to  this  work,  none  of  the 
previous  works  on  amortized  analysis  considered  parallel  evaluation.  The  main 
technical  innovation  of  this  work  is  the  new  rule  for  parallel  composition  that  is 
not  straightforward.  The  smooth  integration  of  this  rule  in  the  existing  framework 
of  multivariate  amortized  resource  analysis  is  a  main  advantages  of  our  work. 

Type  systems  for  inferring  and  verifying  cost  bounds  for  sequential  programs 
have  been  extensively  studied.  Vasconcelos  et  al.  [27, 1]  described  an  automatic 
analysis  system  that  is  based  on  sized-types  [28]  and  derives  linear  bounds  for 
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higher-order  sequential  functional  programs.  Dal  Lago  et  al.  [29,  30]  introduced 
linear  dependent  types  to  obtain  a  complete  analysis  system  for  the  time  complex¬ 
ity  of  the  call-by-name  and  call-by-value  lambda  calculus.  Crary  and  Weirich  [31] 
presented  a  type  system  for  specifying  and  certifying  resource  consumption. 
Danielsson  [32]  developed  a  library,  based  on  dependent  types  and  manual  cost 
annotations,  that  can  be  used  for  complexity  analyses  of  functional  programs. 
We  are  not  aware  of  any  type-based  analysis  systems  for  parallel  evaluation. 

Classically,  cost  analyses  are  often  based  on  deriving  and  solving  recurrence 
relations.  This  approach  was  pioneered  by  Wegbreit  [33]  and  has  been  extensively 
studied  for  sequential  programs  written  in  imperative  languages  [6,  34]  and 
functional  languages  [35,2]. 

In  comparison,  there  has  been  little  work  done  on  the  analysis  of  parallel 
programs.  Albert  et  al.  [36]  use  recurrence  relations  to  derive  cost  bounds  for 
concurrent  object-oriented  programs.  Their  model  of  concurrent  imperative 
programs  that  communicate  over  a  shared  memory  and  the  used  cost  measure  is 
however  quite  different  from  the  depth  of  functional  programs  that  we  study. 

The  only  article  on  using  recurrence  relations  for  deriving  bounds  on  parallel 
functional  programs  that  we  are  aware  of  is  a  technical  report  by  Zimmermann  [37] . 
The  programs  that  were  analyzed  in  this  work  are  fairly  simple  and  more  involved 
programs  such  as  sorting  algorithms  seem  to  be  beyond  its  scope.  Additionally,  the 
technique  does  not  provide  the  compositionality  of  amortized  resource  analysis. 

Trinder  et  al.  [38]  give  a  survey  of  resource  analysis  techniques  for  parallel  and 
distributed  systems.  However,  they  focus  on  the  usage  of  analyses  for  sequential 
programs  to  improve  the  coordination  in  parallel  systems.  Abstract  interpretation 
based  approaches  to  resource  analysis  [5,  39]  are  limited  to  sequential  programs. 

Finally,  there  exists  research  that  studies  cost  models  to  formally  analyze 
parallel  programs.  Blelloch  and  Greiner  [10]  pioneered  the  cost  measures  work 
and  depth  that  we  use  in  this  work.  There  are  more  advanced  cost  models  that 
take  into  account  caches  and  10  (see,  e.g.,  Blelloch  and  Harper  [11]),  However, 
these  works  do  not  provide  machine  support  for  deriving  static  cost  bounds. 

9  Conclusion 

We  have  introduced  the  first  type-based  cost  analysis  for  deriving  bounds  on 
the  depth  of  evaluations  of  parallel  function  programs.  The  derived  bounds  are 
multivariate  resource  polynomials  that  can  express  a  wide  range  of  relations 
between  different  parts  of  the  input.  As  any  type  system,  the  analysis  is  naturally 
compositional.  The  new  analysis  system  has  been  implemented  in  Resource  Aware 
ML  (RAML)  [14].  We  have  performed  a  thorough  and  reproducible  experimental 
evaluation  with  typical  examples  from  functional  programming  that  shows  the 
practicability  of  the  approach. 

An  extension  of  amortized  resource  analysis  to  handle  non-polynomial  bounds 
such  as  max  and  log  in  a  compositional  way  is  an  orthogonal  research  question 
that  we  plan  to  address  in  the  future.  A  promising  direction  that  we  are  currently 
studying  is  the  use  of  numerical  logical  variables  to  guide  the  analysis  to  derive 
non-polynomial  bounds.  The  logical  variables  would  be  treated  like  regular 
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variables  in  the  analysis.  However,  the  user  would  be  responsible  for  maintaining 
and  proving  relations  such  as  a  =  log  n  where  a  is  a  logical  variable  an  n  is 
the  size  of  a  regular  data  structure.  In  this  way,  we  would  gain  flexibility  while 
maintaining  the  compositionality  of  the  analysis. 

Another  orthogonal  question  is  the  extension  of  the  analysis  to  additional 
language  features  such  as  higher-order  functions,  references,  and  user-defined 
data  structures.  These  extensions  have  already  been  implemented  in  a  prototype 
and  pose  interesting  research  challenges  in  there  own  right.  We  plan  to  report  on 
them  in  a  forthcoming  article. 
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Abstract 

This  paper  presents  a  new  approach  for  automatically  deriving  worst- 
case  resource  bounds  for  C  programs.  The  described  technique 
combines  ideas  from  amortized  analysis  and  abstract  interpretation 
in  a  unified  framework  to  address  four  challenges  for  state-of- 
the-art  techniques:  compositionality,  user  interaction,  generation 
of  proof  certificates,  and  scalability.  Compositionality  is  achieved 
by  incorporating  the  potential  method  of  amortized  analysis.  It 
enables  the  derivation  of  global  whole-program  bounds  with  local 
derivation  rules  by  naturally  tracking  size  changes  of  variables  in 
sequenced  loops  and  function  calls.  The  resource  consumption 
of  functions  is  described  abstractly  and  a  function  call  can  be 
analyzed  without  access  to  the  function  body.  User  interaction  is 
supported  with  a  new  mechanism  that  clearly  separates  qualitative 
and  quantitative  verification.  A  user  can  guide  the  analysis  to 
derive  complex  non-linear  bounds  by  using  auxiliary  variables  and 
assertions.  The  assertions  are  separately  proved  using  established 
qualitative  techniques  such  as  abstract  interpretation  or  Hoare 
logic.  Proof  certificates  are  automatically  generated  from  the  local 
derivation  rules.  A  soundness  proof  of  the  derivation  system  with 
respect  to  a  formal  cost  semantics  guarantees  the  validity  of  the 
certificates.  Scalability  is  attained  by  an  efficient  reduction  of  bound 
inference  to  a  linear  optimization  problem  that  can  be  solved  by 
off-the-shelf  LP  solvers.  The  analysis  framework  is  implemented 
in  the  publicly-available  tool  C'^B.  An  experimental  evaluation 
demonstrates  the  advantages  of  the  new  technique  with  a  comparison 
of  C^B  with  existing  tools  on  challenging  micro  benchmarks  and 
the  analysis  of  more  than  2900  lines  of  C  code  from  the  cBench 
benchmark  suite. 
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1.  Introduction 

In  software  engineering  and  software  verification,  we  often  would 
like  to  have  static  information  about  the  quantitative  behavior  of 
programs.  For  example,  stack  and  heap-space  bounds  are  important 
to  ensure  the  reliability  of  safety-critical  systems  [37].  Static  energy 
usage  information  is  critical  for  autonomous  systems  and  has 
applications  in  cloud  computing  [17,  18].  Worst-case  time  bounds 
can  help  create  constant-time  implementations  that  prevent  side- 
channel  attacks  [9,  32].  Loop  and  recursion-depth  bounds  are 
used  to  ensure  the  accuracy  of  programs  that  are  executed  on 
unreliable  hardware  [14]  and  complexity  bounds  are  needed  to 
verify  cryptographic  protocols  [8].  In  general,  quantitative  resource 
information  can  provide  useful  feedback  for  developers. 

Available  techniques  for  automatically  deriving  worst-case  re¬ 
source  bounds  fall  into  two  categories.  Techniques  in  the  first  cate¬ 
gory  derive  impressive  bounds  for  numerical  imperative  programs, 
but  are  not  compositional.  This  is  problematic  if  one  needs  to  derive 
global  whole-program  bounds.  Techniques  in  the  second  category 
derive  tight  whole-program  bounds  for  programs  with  regular  loop 
or  recursion  patterns  that  decrease  the  size  of  an  individual  variable 
or  data  structure.  They  are  highly  compositional,  scale  for  large 
programs,  and  work  directly  on  the  syntax.  However,  they  do  not 
support  multivariate  interval-based  resource  bounds  (e.g.,  x  —  y) 
which  are  common  in  C  programs.  Indeed,  it  has  been  a  long-time 
open  problem  to  develop  compositional  resource  analysis  techniques 
that  can  work  for  typical  imperative  code  with  non-regular  iteration 
patterns,  signed  integers,  mutation,  and  non-linear  control  flow. 

Tools  in  the  first  category  include  SPEED  [22],  KoAT  [13], 
PUBS  [1],  Rank  [3],  and  LOOPUS  [38].  They  lack  compositionality 
in  at  least  two  ways.  First,  they  all  base  their  analysis  on  some  form 
of  ranking  function  or  counter  instrumentation  that  is  linked  to  a 
local  analysis.  As  a  result,  loop  bounds  are  arithmetic  expressions 
that  depend  on  the  values  of  variables  just  before  the  loop.  This 
makes  it  hard  to  give  a  resource  bound  on  a  sequence  of  loops  and 
function  calls  in  terms  of  the  input  parameters  of  a  function.  Second, 
while  all  popular  imperative  programming  languages  provide  a 
function  or  procedure  abstraction,  available  tools  are  not  able  to 
abstract  resource  behavior;  instead,  they  have  to  inline  the  procedure 
body  to  perform  their  analysis. 

Tools  in  the  second  category  originate  form  the  potential  method 
of  amortized  analysis  and  type  systems  for  functional  programs  [26, 
28].  It  has  been  shown  that  class  definitions  of  object-oriented  pro¬ 
grams  [29]  and  data- structure  predicates  of  separation  logic  [7]  can 
play  the  role  of  the  type  system  in  imperative  programs.  However,  a 
major  weakness  of  existing  potential-based  techniques  is  that  they 
can  only  associate  potential  with  individual  program  variables  or 
data  structures.  For  C  programs,  this  fails  for  loops  as  simple  as 
f  or  (i=x;  i<y ;  i++)  where  y  —  i  decreases,  but  not  |i|. 

A  general  problem  with  existing  tools  (in  both  categories)  is 
user  interaction.  When  a  tool  fails  to  find  a  resource  bound  for  a 
program,  there  is  no  possibility  for  sound  user  interaction  to  guide 
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the  tool  during  bound  derivation.  For  example,  there  is  no  concept 
of  manual  proofs  of  resource  bounds;  and  no  framework  can  support 
composition  of  manually  derived  bounds  with  automatically  inferred 
bounds. 

This  paper  presents  a  new  compositional  framework  for  automat¬ 
ically  deriving  resource  bounds  on  C  programs.  This  new  approach 
is  an  attempt  to  unify  the  two  aforementioned  categories:  It  solves 
the  compositionality  issues  of  techniques  for  numerical  imperative 
code  by  adapting  amortized-analysis-based  techniques  from  the 
functional  world.  Our  automated  analysis  is  able  to  infer  resource 
bounds  on  C  programs  with  mutually-recursive  functions  and  inte¬ 
ger  loops.  The  resource  behavior  of  functions  can  be  summarized  in 
a  functional  specification  that  can  be  used  at  every  call  site  without 
accessing  the  function  body.  To  our  knowledge  this  is  the  first  tech¬ 
nique  based  on  amortized  analysis  that  is  able  to  derive  bounds  that 
depend  on  negative  numbers  and  differences  of  variables.  It  is  also 
the  first  resource  analysis  technique  for  C  that  deals  naturally  with 
recursive  functions  and  sequenced  loops,  and  can  handle  resources 
that  may  become  available  during  execution  (e.g.,  when  freeing 
memory).  Compared  to  more  classical  approaches  based  on  rank¬ 
ing  functions,  our  tool  inherits  the  benefits  of  amortized  reasoning. 
Using  only  one  simple  mechanism,  it  handles: 

•  interactions  between  sequential  loops  or  function  calls  through 
size  changes  of  variables, 

•  nested  loops  that  influence  each  ofher  with  the  same  set  of 
modified  variables, 

•  and  amortized  bounds  as  found,  for  example,  in  the  Knuth- 
Morris-Pratt  algorithm  for  string  search. 

The  main  innovations  that  make  amortized  analysis  work  on  imper¬ 
ative  languages  are  to  base  the  analysis  on  a  Hoare-like  logic  and 
to  track  multivariate  quantities  instead  of  program  variables.  This 
leads  to  precise  bounds  expressed  as  functions  of  sizes  \  \x,  y\\  = 
max(0,  y  —  x)  oi  intervals.  A  distinctive  feature  of  our  analysis 
system  is  that  it  reduces  linear  bound  inference  to  a  linear  optimiza¬ 
tion  problem  that  can  be  solved  by  off-the-shelf  LP  solvers.  This 
enables  the  efficient  inference  of  global  bounds  for  larger  programs. 
Moreover,  our  local  inference  rules  automatically  generate  proof 
certificates  that  can  be  easily  checked  in  linear  time. 

The  use  of  the  potential  method  of  amortized  analysis  makes 
user  interaction  possible  in  different  ways.  For  one  thing,  we  can 
directly  combine  the  new  automatic  analysis  with  manually  derived 
bounds  in  a  previously-developed  quantitative  Hoare  logic  [15]  (see 
Section  7).  For  another  thing,  we  describe  a  new  mechanism  that 
allows  the  separation  of  quantitative  and  qualitative  verification 
(see  Section  6).  Using  this  mechanism,  the  user  can  guide  the 
analysis  by  using  auxiliary  variables  and  logical  assertions  that 
can  be  verified  by  existing  qualitative  tools  such  as  Hoare  logic  or 
abstract  interpretation.  In  this  way,  we  can  benefit  from  existing 
automation  techniques  and  provide  a  middle-ground  between  fully 
automatic  and  fully  manual  verification  for  bound  derivation.  This 
enables  the  semi-automatic  inference  of  non-linear  bounds,  such  as 
polynomial,  logarithmic,  and  exponential  bounds. 

We  have  implemented  the  analysis  system  in  the  tool  C'^B  and 
experimentally  evaluated  its  effectiveness  by  analyzing  system  code 
and  examples  from  the  literature.  C'^B  has  automatically  derived 
global  resource  bounds  for  more  than  2900  lines  of  C  code  from  the 
cBench  benchmark  suite.  The  extended  version  of  this  article  [16] 
contains  more  than  30  challenging  loop  and  recursion  patterns  that 
we  collected  from  open  source  software  and  the  literature.  Our 
analysis  can  find  asymptotically  tight  bounds  for  all  but  one  of  these 
patterns,  and  in  most  cases  the  derived  constant  factors  are  tight. 
To  compare  C'^B  with  existing  techniques,  we  tested  our  examples 
with  tools  such  as  KoAT  [13],  Rank  [3],  and  LOOPUS  [38].  Our 


experiments  show  that  the  bounds  that  we  derive  are  often  more 
precise  than  those  derived  by  existing  tools.  Only  LOOPUS  [38], 
which  also  uses  amortization  techniques,  is  able  to  achieve  a  similar 
precision. 

Examples  from  cBench  and  micro  benchmarks  demonstrate  the 
practicality  and  expressiveness  of  the  user  guided  bound  inference. 
For  example,  we  derive  a  logarithmic  bound  for  a  binary  search 
function  and  a  bound  that  amortizes  the  cost  of  k  increments  to  a 
binary  counter  (see  Section  6). 

In  summary,  we  make  the  following  contributions. 

•  We  develop  the  first  automatic  amortized  analysis  for  C  pro¬ 
grams.  It  is  naturally  compositional,  tracks  size  changes  of  vari¬ 
ables  to  derive  global  bounds,  can  handle  mutually-recursive 
functions,  generates  resource  abstractions  for  functions,  derives 
proof  certificates,  and  handles  resources  that  may  become  avail¬ 
able  during  execution. 

•  We  show  how  to  automatically  reduce  the  inference  of  linear 
resource  bounds  to  efficient  LP  solving. 

•  We  describe  a  new  method  of  harnessing  existing  qualitative 
verification  techniques  to  guide  the  automatic  amortized  analysis 
to  derive  non-linear  resource  bounds  with  LP  solving. 

•  We  prove  the  soundness  of  the  analysis  with  respect  to  a 
parametric  cost  semantics  for  C  programs.  The  cost  model  can 
be  further  customized  with  function  calls  (tick(n))  that  indicate 
resource  usage. 

•  We  implemented  our  resource  bound  analysis  in  the  publicly- 
available  tool  C'^B. 

•  We  present  experiments  with  on  more  than  2900  lines  of 
C  code.  A  detailed  comparison  shows  that  our  prototype  is  the 
only  tool  that  can  derive  global  bounds  for  larger  C  programs 
while  being  as  powerful  as  existing  tools  when  deriving  linear 
local  bounds  for  tricky  loop  and  recursion  patterns. 

2.  The  Potential  Method 

The  idea  that  underlies  the  design  of  our  framework  is  amortized 
analysis  [39].  Assume  that  a  program  S  executes  on  a  starting  state 
a  and  consumes  n  resource  units  of  some  user-defined  quantity.  We 
denote  that  by  writing  [S,  o)  ]]„  cr'  where  o'  is  the  program  state 
after  the  execution.  The  basic  idea  of  amortized  analysis  is  to  define 
a  potential  function  d?  that  maps  program  states  to  non-negative 
numbers  and  to  show  that  n  if  ct  is  a  program  state  such 

that  (S',  cr)  ]],„  o' .  Then  $(cr)  is  a  valid  resource  bound. 

To  obtain  a  compositional  reasoning  we  also  have  to  take  into 
account  the  state  resulting  from  a  program’s  execution.  We  thus  use 
two  potential  functions,  one  that  applies  before  the  execution,  and 
one  that  applies  after.  The  two  functions  must  respect  the  relation 
d?((T)  n  for  all  states  a  and  a'  such  that  (S,  cr)  ]],„  cr'. 

Intuitively,  ‘I>(c7)  must  provide  enough  potential  for  both,  paying  for 
the  resource  cost  of  the  computation  and  paying  for  the  potential 
d?'(cr')  on  the  resulting  state  cr'.  That  way,  if  (cr,  Si)  (]„  cr'  and 
(cr',  S2)  ([m  cr",  we  get  'l?(cr)  ^  n  -f  'I?'(cr')  and  $'(cr')  ^ 
in -p  $" (a").  This  can  be  composed  as  T>((j)  (n-l- m) -I- 'i>"(cr"). 

Note  that  the  initial  potential  function  $  provides  an  upper  bound 
on  the  resource  consumption  of  the  whole  program.  What  we  have 
observed  is  that,  if  we  define  {<!>}  S  {$'}  to  mean 

Vcr  n  cr'.  (cr,  S)  (]„  cr'  =►  'l>(cr)  ^  n  -I-  ‘I>^(cr')  , 
then  we  get  the  following  familiar  looking  rule 

{-F}  Si  {$'}  {F'}  S2  {<!>"} 

{F}Si;S2{<&"} 
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{■;  0  + 

while  (x+K<=y)  { 

{x  +  K  iiy,  0  +  ^■\[x,y]\} 
x=x+K; 

{x  =£  y;  T  +  j^-\[x,y]\} 
tick(T) ; 

{x  =£  y;  0  +  ^■|[x,y]|} 

} 

{x  ^  y;0  +  j^-\[x,y]\} 

Figure  1.  Derivation  of  a  tight  bound  on  the  number  of  ticks  for 
a  standard /or  loop.  The  parameters  K  >  Q  and  T  >  0  are  not 
program  variables  but  denote  concrete  constants. 


This  mle  already  shows  a  departure  from  classical  techniques  that  are 
based  on  ranking  functions.  Reasoning  with  two  potential  functions 
promotes  compositional  reasoning  by  focusing  on  the  sequencing  of 
programs.  In  the  previous  rule,  d?  gives  a  bound  for  Si',  S2  through 
the  intermediate  potential  <&',  even  though  it  was  derived  on  Si  only. 
Similarly,  other  language  constructs  lead  to  rules  for  the  potential 
functions  that  look  very  similar  to  Hoare  logic  or  effect  system  rules. 
These  rules  enable  reasoning  about  resource  usage  in  a  flexible  and 
compositional  way,  which,  as  a  side  effect,  produces  a  certificate  for 
the  derived  resource  bound. 

The  derivation  of  a  resource  bound  using  potential  functions  is 
best  explained  by  example.  If  we  use  the  tick  metric  that  assigns 
cost  n  to  the  function  call  tick(n)  and  cost  0  to  all  other  operations 
then  the  cost  of  the  following  example  can  be  bounded  by  |  [x,  i/]  |  = 
max(j/— a;,  0). 

while  (x<y)  {  x=x+l;  tick(l) ;  }  (Example  1) 

To  derive  this  bound,  we  start  with  the  initial  potential  $o  =  |  [a:,  i/]  | , 
which  we  also  use  as  the  loop  invariant.  For  the  loop  body  we  have 
(like  in  Hoare  logic)  to  derive  a  triple  {>l‘o}x  =  x+l;  tick(l)  {4>o}. 
We  can  only  do  so  if  we  utilize  the  fact  that  a;  <  t/  at  the  beginning 
of  the  loop  body.  The  reasoning  then  works  as  follows.  We  start 
with  the  potential  |[a;,j/]|  and  the  fact  that  |[a;,t/]]  >  0  before 
the  assignment.  If  we  denote  the  updated  version  of  x  after  the 
assignment  by  x'  then  the  relation  |  [a;,  t/]  |  =  |  [a:',  i/]  |  +  1  between 
the  potential  before  and  after  the  assignment  x  =  x  +  1  holds.  This 
means  that  we  have  the  potential  |  [a:,  t/]  |  +  1  before  the  statement 
tick(l).  Since  tick(l)  consumes  one  resource  unit,  we  end  up  with 
potential  |  [a:,  y]  |  after  the  loop  body  and  have  established  the  loop 
invariant  again. 

Figure  1  shows  a  derivation  of  the  bound  J-|[a;,t/]|  on  the 
number  of  ticks  for  a  generalized  version  of  Example  1  in  which  we 
increment  x  by  a  constant  Ff  >  0  and  consume  T  >  0  resources 
in  each  iteration.  The  reasoning  is  similar  to  the  one  of  Example  1 
except  that  we  obtain  the  potential  K-  J  after  the  assignment.  In 
the  figure,  we  separate  logical  assertions  from  potential  functions 
with  semicolons.  Note  that  the  logical  assertions  are  only  used  in 
the  rule  for  the  assignment  x  =  x  +  K. 

To  the  best  of  our  knowledge,  no  other  implemented  tool  for  C  is 
currently  capable  of  deriving  a  tight  bound  on  the  cost  of  such  a  loop. 
Eor  T  =  1  (many  systems  focus  on  the  number  of  loop  iterations 
without  a  cost  model)  and  K  =  10,  KoAT  computes  the  bound 
\x\  +  \y\  +  10,  Rank  computes  the  bound  y  —  x  —  7,  and  LOOPUS 
computes  the  bound  t/  —  x  —  9.  Only  PUBS  computes  the  tight  bound 
0.1{y  —  x)  if  we  translate  the  program  into  a  term-rewriting  system 
by  hand.  We  will  show  in  the  following  sections  that  the  potential 
method  makes  automatic  bound  derivation  straightforward. 

The  concept  of  a  potential  function  is  a  generalization  of  the 
concept  of  a  ranking  function.  A  potential  function  can  be  used  like 


a  ranking  function  if  we  use  the  tick  metric  and  add  the  statement 
tick(l)  to  every  back  edge  of  the  program  (loops  and  function  calls). 
However,  a  potential  function  is  more  flexible.  Eor  example,  we  can 
use  a  potential  function  to  prove  that  Example  2  does  not  consume 
any  resources  in  the  tick  metric. 

while  (x<y)  {tick(-l) ;  x=x+l;  tick(l)}  (Example  2) 

while  (x<y)  {  x=x+l ;  tick(lO) ;  }  (Example  3) 

Similarly  we  can  prove  that  Example  3  can  be  bounded  by  10|  [x,  y]  \ . 
In  both  cases,  we  reason  exactly  like  in  the  first  version  of  the  while 
loop  to  prove  the  bound.  Of  course,  such  loops  with  different  tick 
annotations  can  be  seamlessly  combined  in  a  larger  program. 

3.  Compositional  Resource-Bound  Analysis 

In  this  section  we  describe  the  high-level  design  of  the  automatic 
amortized  analysis  that  we  implemented  in  C'^B.  Examples  explain 
and  motivate  our  design  decisions. 

Linear  Potential  Functions.  To  find  resource  bounds  automati¬ 
cally,  we  first  need  to  restrict  our  search  space.  In  this  work,  we  fo¬ 
cus  on  the  following  form  of  potential  functions,  which  can  express 
tight  bounds  for  many  typical  programs  and  allows  for  inference 
with  linear  programming. 

<E>(cr)  =  go+  Yj  (l(^,v)'\[<^ix),G{y)]\. 

a:,y€dom(£T)  Ax^y 

Here  cr  :  (Locals  — »  Z)  x  (Globals  ^  Z)  is  a  simplified 
program  state  that  maps  variable  names  to  integers,  |  [a,  fe]  |  = 
max(0,  b  —  a),  and  qt  e  Qj.  To  simplify  the  references  to  the 
linear  coefficients  qi,  we  introduce  an  index  set  I.  This  set  is 
defined  to  be  {0}  u  {{x,y)  \  x,y  e  Var  a  x  y}.  Each  index 
i  corresponds  to  a  base  function  fi  in  the  potential  function:  0 
corresponds  to  the  constant  function  ct  h->  1,  and  (x,  y)  corresponds 
to  (T  H- >  |[(j(x),  a{y)]\.  Using  these  notations  we  can  rewrite  the 
above  equality  as  =  '^^^iqifi{p')}^e.  often  write  xy  to 

denote  the  index  (x,  y).  This  allows  us  to  uniquely  represent  any 
linear  potential  function  $  as  a  quantitative  annotation  Q  =  {qi)iEi, 
that  is,  a  family  of  non-negative  rational  numbers  where  only  a  finite 
number  of  elements  are  not  zero. 

In  the  potential  functions,  we  treat  constants  as  global  variables 
that  cannot  be  assigned  to.  For  example,  if  the  program  contains  the 
constant  1988  then  we  have  a  variable  Ciggs  and  cr(ci98s)  =  1988. 
We  assume  that  every  program  state  includes  the  constant  cq. 

Abstract  Program  State.  In  addition  to  the  quantitative  annota¬ 
tions,  our  automatic  amortized  analysis  needs  to  maintain  a  minimal 
abstract  state  to  justify  certain  operations  on  quantitative  annotations. 
For  example  when  analyzing  the  code  x  <—  x  -I-  y,  it  is  helpful  to 
know  the  sign  of  y  to  determine  which  intervals  will  increase  or 
decrease.  The  knowledge  needed  by  our  rules  can  be  inferred  by 
local  reasoning  (i.e.  in  basic  blocks  without  recursion  and  loops) 
within  usual  theories  (e.g.  Presburger  arithmetic  or  bit  vectors). 

The  abstract  program  state  is  represented  as  logical  contexts  in 
the  derivation  system  used  by  our  automated  tool.  Our  implementa¬ 
tion  finds  these  logical  contexts  using  abstract  interpretation  with 
the  domain  of  linear  inequalities.  We  observed  that  the  rules  of  the 
analysis  often  require  only  minimal  local  knowledge.  This  means 
that  it  is  not  necessary  for  us  to  compute  precise  loop  invariants  and 
only  a  rough  fixpoint  (e.g.  keeping  only  inequalities  on  variables 
unchanged  by  the  loop)  is  sufficient  to  obtain  good  bounds. 

Challenging  Loops.  One  might  think  that  our  set  of  potential 
functions  is  too  simplistic  to  be  able  to  express  and  prove  bounds 
for  realistic  programs.  Nevertheless,  we  can  handle  challenging 
example  programs  without  special  tricks  or  techniques.  Examples 
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while  (ii>x)  { 

{n>x\  |[x,n]|  +  |[y,  m]|} 
if  (in>y) 

{m>y\  \[x,n\\  +  \[y,m\\) 

y=y+i ; 

{•;  i+i[a;,»^]|+|[y,HI} 

else 

{n>x;  \[x,n\\  +  \[y,m]\} 
x=x+l ; 

{•;  l+|[a:,«]|  +  |[y,m]|} 

{■;  l  +  |[a:,n]|  +  |[y,m]|} 
tick(l) ; 

>  {•;  l[a;,ri]|+|[y,m]|} 


while  (x<n)  { 

{x<n;  |[3^,n]|  +  |[2:,n]|} 
if  (z>x) 

{x<n;  |[a:,n]|  +  |[2,n]|} 
x=x+l ; 

{■;  l+|[a:,n]|  +  |[z,n]|} 

else 

{z^x,x<n;  I  l  +  l  [2,  n]  1} 

z=z+l ; 

{■;  l+|[a:,n]|  +  |[^,n]|} 

{■;  l+|[3;,n]|  +  |[2,n]|} 

tick(l) ; 

>  {•;  l[*-«]l+l[^-^]|} 


while  (z-y>0)  { 

{y<z-,  3.1|[j/,z]|+0.1|[0,j/]|} 

y=y+i; 

{•;  3+3.1|[3/,z]|+0.1|[0,j/]|} 
tick(3) ; 

{■;  3.1|[y.^]|+0.1|[0,?;]|} 

} 

{■;  3.1|fe,z]|+0.1|[0,y]|} 
while  (y>9)  { 

{y>9-,  3.1|[y,^]|+0.1|[0,?;]|} 
y=y-10; 

{•;  l+3.1|[j/,^]|+0.1|[0,j/]|} 
tick(l) ; 

}  {•;  3.1|[j/,^]|+0.1|[0,?;]|} 


while  (n<0)  { 

{n<0;  P{n,  y)} 

n=n+l ; 

{■;  59+P(n,  y)} 
y=y+1000; 

{■;  9+P{n,y)} 
while  (y>=100  kk  *){ 
{y>99;  9+P{n,y)) 
y=y-100; 

{■;  14+P(n,  y)} 
tick(5) ; 

}  {•;  9+P(n,j/)} 
tick(9) ; 

}  {■;  P{n,y)} 


\[x,n]\  +  \[y,m]\ 

speed.l 


|[2;,n]|  +  \[z,n]\ 

speed_2 


3.1|[j/,2]|+0.1|[0,t/]| 

t08a 


59|[n,0]|+0.05|[0,y]| 

Ml 


Figure  2.  Derivations  of  bounds  on  the  number  of  ticks  for  challenging  examples.  Examples  speedJ  and  speedM  (from  [22])  use  tricky 
iteration  patterns,  t08a  contains  sequential  loops  so  that  the  iterations  of  the  second  loop  depend  on  the  first,  and  t27  contains  interacting 
nested  loops.  In  Example  t27,  we  use  the  abbreviation  P{n,  y)  :=  59|  [n,  0]  |  +0.05|  [0, 1/]  | . 


void  c_dowii  (int  x,iiit  y)  f 

if  (x>y)  {tick(l);  c_up(x-l ,y) ; } 

} 

void  c_up  (int  x,  int  y)  { 

if  (y+l<x)  {tick(l);  c_down(x,y+2) ; } 

} 


for  (;  1>=8;  l-=8) 

/*  process  one  block  */ 
tick(N) ; 

for  (;  1>0;  1—) 

/*  save  leftovers  */ 
tick(l) ; 


0.33  +  0.67|[j/,  2;]|  (c_down(x,  y)) 
0.671  [t/,2;]  I  (c_up(x,y)) 
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f|[0,(]|  ifiV^S 

7^  +  f|[0,«]|  ifiV<8 

t61 


for  (; ;)  { 

do  {  1++;  tick(l) ;  } 
while  (l<h  kk  *) ; 
do  {  h — ;  tick(l) ;  } 
while  (h>l  kk  *) ; 
if  (h<=l)  break; 
tick(l);  /*  swap  elems.  */  } 

2  +  3\[l,h]\ 

162 


Figure  3.  Example  t39  shows  two  mutually-recursive  functions  with  the  computed  tick  bounds.  Example  t61  and  t62  demonstrate  the  unique 
compositionality  of  our  system.  In  t61,  ^  0  is  a  fixed  but  arbitrary  constant. 


speedj  and  speedJ2  in  Figure  2,  which  are  taken  from  previous 
work  [22],  demonstrate  that  our  method  can  handle  tricky  iteration 
patterns.  The  SPEED  tool  [22]  derives  the  same  bounds  as  our 
analysis  but  requires  heuristics  for  its  counter  instrumentation.  These 
loops  can  also  be  handled  with  inference  of  disjunctive  invariants, 
but  in  the  abstract  interpretation  community,  these  invariants  are 
known  to  be  notoriously  difficult  to  generate.  In  Example  speedJ 
we  have  one  loop  that  first  increments  variable  y  up  to  m  and 
then  increments  variable  x  up  to  n.  We  derive  the  tight  bound 
I  [a;,  n]  I  +  |  [y,  m]  | .  Example  speed J2  is  even  trickier,  and  we  found 
it  hard  to  find  a  bound  manually.  However,  using  potential  transfer 
reasoning  as  in  amortized  analysis,  it  is  easy  to  prove  the  tight  bound 
|[a;,n]|  +  \[z,n\\. 

Nested  and  Sequenced  Loops.  Example  t08a  in  Figure  2  shows 
the  ability  of  the  analysis  to  discover  interaction  between  sequenced 
loops  through  size  change  of  variables.  We  accurately  track  the  size 
change  of  y  in  the  first  loop  by  transferring  the  potential  0.1  from 
|[y,  z]\  to  |[0,  y]|.  Furthermore,  t08a  shows  again  that  we  do  not 
handle  the  constants  1  or  0  in  any  special  way.  In  all  examples  we 
could  replace  0  and  1  with  other  constants  like  in  the  second  loop 
and  still  derive  a  tight  bound.  Example  t27  in  Figure  2  shows  how 
amortization  can  be  used  to  handle  interacting  nested  loops.  In  the 
outer  loop  we  increment  the  variable  n  until  n  =  0.  In  each  of  the 
|[n,  0]|  iterations,  we  increment  the  variable  y  by  1000.  Then  we 
non-deterministically  (expressed  by  h=)  execute  an  inner  loop  that 
decrements  y  by  100  until  y  <  100.  The  analysis  discovers  that 


only  the  first  execution  of  the  inner  loop  depends  on  the  initial  value 
of  y.  We  again  derive  tight  constant  factors. 

Mutually  Recursive  Functions.  As  mentioned,  the  analysis  also 
handles  advanced  control  flow  like  break  and  return  statements,  and 
mutual  recursion.  Example  t39  in  Figure  3  contains  two  mutually- 
recursive  functions  with  their  automatically  derived  tick  bounds. 
The  function  c_down  decrements  its  first  argument  x  until  it  reaches 
the  second  argument  y.  It  then  recursively  calls  the  function  c_up, 
which  is  dual  to  c_do\A/n.  Here,  we  count  up  y  by  2  and  call  c_down. 
C'^B  is  the  only  available  system  that  computes  a  tight  bound. 

Compositionality.  With  two  concrete  examples  from  open-source 
projects  we  demonstrate  that  the  compositionality  of  our  method  is 
indeed  crucial  in  practice. 

Example  t61  in  Figure  3  is  typical  for  implementations  of  block- 
based  cryptographic  primitives:  Data  of  arbitrary  length  is  consumed 
in  blocks  and  the  leftover  is  stored  in  a  buffer  for  future  use  when 
more  data  is  available.  It  is  present  in  all  the  block  encryption 
routines  of  PGP  and  also  used  in  performance  critical  code  to  unroll 
a  loop.  For  example  we  found  it  in  a  bit  manipulating  function  of  the 
libtiff  library  and  a  CRC  computation  routine  of  MAD,  an  MPEG 
decoder.  This  looping  pattern  is  handled  particularly  well  by  our 
method.  If  ^  8,  C*B  infers  the  bound  y  |[0,  (]|,  but  if  A"  <  8, 
it  infers  7 ^  |  [0,  /]  ] .  The  selection  of  the  block  size  (8)  and 
the  cost  in  the  second  loop  (tick(l))  are  random  choices  and  C*B 
would  also  derive  tight  bound  for  other  values. 
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To  understand  the  resource  bound  for  the  case  N  <  8,  first  note 
that  the  cost  of  the  second  loop  is  |  [0,  Z]  |.  After  the  first  loop,  we 
still  have  'f-|[0,Z]|  potential  available  from  the  invariant.  So  we 
have  to  raise  the  potential  of  |  [0, 1]  \  from  ^  to  1,  that  is,  we  must 
pay  Z]|.  But  since  we  got  out  of  the  first  loop,  we  know 

that  Z  <  8,  so  it  is  sound  to  only  pay  7 potential  units  instead. 
This  level  of  precision  and  compositionality  is  only  achieved  by  our 
novel  analysis,  no  other  available  tool  derives  the  aforementioned 
tight  bounds. 

Example  t62  (Figure  3)  is  the  inner  loop  of  a  quick  sort  imple¬ 
mentation  in  cBench.  More  precisely,  it  is  the  partitioning  part  of  the 
algorithm.  This  partition  loop  has  linear  complexity,  and  feeding  it 
to  our  analysis  gives  the  worst-case  bound  2  -I-  3|  [Z,  Zi]  | .  This  bound 
is  not  optimal  but  it  can  be  refined  by  rewriting  the  program.  To 
understand  the  bound,  we  can  reason  as  follows.  If  h  ^  I  initially, 
the  cost  of  the  loop  is  2.  Otherwise,  the  cost  of  each  round  (at  most 
3)  can  be  payed  using  the  potential  of  [Z,  h]  by  the  first  increment 
to  Z  because  we  know  that  Z  <  h.  The  two  inner  loops  can  also  use 
[Z,  h]  to  pay  for  their  inner  costs.  KoAT  fails  to  find  a  bound  and 
LOOPUS  derives  the  quadratic  bound  {h  —  I  —  1)^.  Following  the 
classical  technique,  these  tools  try  to  find  one  ranking  function  for 
each  loop  and  combine  them  multiplicatively  or  additively. 

In  the  extended  version  [16]  is  a  list  of  more  than  30  classes  of 
challenging  programs  that  we  can  automatically  analyze.  Section  8 
contains  a  more  detailed  comparison  with  other  tools. 

4.  Derivation  System 

In  the  following  we  describe  the  local  and  compositional  derivation 
rules  of  the  automatic  amortized  analysis. 

Cost  Aware  Clight.  We  present  the  rules  for  a  subset  of  Clight. 
Clight  is  the  first  intermediate  language  of  the  CompCert  com¬ 
piler  [34].  It  is  a  subset  of  C  with  a  unified  looping  construct  and 
side-effect  free  expressions.  We  reuse  most  of  CompCert’s  syntax 
but  instrument  the  semantics  with  a  resource  metric  M  that  accounts 
for  the  cost  (an  arbitrary  rational  number)  of  each  step  in  the  oper¬ 
ational  semantics.  For  example.  Me  (exp)  is  the  cost  of  evaluating 
the  expression  exp.  The  rationals  Mf  and  Mr  account  respectively 
for  the  cost  of  a  call  to  the  function  /  and  the  cost  of  returning  from 
it.  More  details  are  provided  in  Section  7. 

In  the  rules,  assignments  are  restricted  to  the  form  x  <—  y  or 
X  <—  X  +  y.ln  the  implementation,  a  Clight  program  is  converted 
into  this  form  prior  to  analysis  without  changing  the  resource  cost. 
This  is  achieved  by  using  a  series  of  cost-free  assignments  that  do 
not  result  in  additional  cost  in  the  semantics.  Non-linear  operations 
such  as  a;  <—  z  *  y  or  x  <—  a\y\  are  handled  by  assigning  0  to 
coefficients  like  q^a  and  qax  that  contain  x  after  the  assignment. 
This  sound  treatment  ensures  that  no  further  loop  bounds  depend  on 
the  result  of  the  non-linear  operation. 

Judgements.  The  derivation  system  for  the  automatic  amortized 
analysis  is  defined  in  Figure  4.  The  derivation  rules  derive  judge¬ 
ments  of  the  form 

(rB;gs),(rK;Qfl)  h  {r;Q},5{r';Q'}. 

The  part  {T;  Q}  5  {T';  Q'}  of  the  judgement  can  be  seen  as  a 
quantitative  Hoare  triple.  All  assertions  are  split  into  two  parts, 
the  logical  part  and  the  quantitative  part.  The  quantitative  part 
Q  represents  a  potential  function  as  a  collection  of  non-negative 
numbers  qi  indexed  by  the  index  set  I.  The  logical  part  T  is  left 
abstract  but  is  enforced  by  our  derivation  system  to  respect  classic 
Hoare  logic  constraints.  The  meaning  of  this  basic  judgment  is  as 
follows:  If  S  is  executed  with  starting  state  a,  the  assertions  in  T 
hold,  and  at  least  Q{o-)  resources  are  available  then  the  evaluation 


does  not  run  out  of  resources  and,  if  the  execution  terminates  in  state 
a',  there  are  at  least  Q' {o')  resources  left  and  T'  holds  for  o' . 

The  judgement  is  a  bit  more  involved  since  we  have  to  take  into 
account  the  early  exit  statements  break  and  return.  This  is  similar 
to  classical  Hoare  triples  in  the  presence  of  non-linear  control  flow. 
In  the  judgement,  {Tb',Qb)  is  the  postcondition  that  holds  when 
breaking  out  of  a  loop  using  break.  Similarly,  (TR'yQR)  is  the 
postcondition  that  holds  when  returning  from  a  function  call. 

As  a  convention,  if  Q  and  Q'  are  quantitative  annotations  we 
assume  that  Q  =  {qi)isi  and  Q'  =  {q'i)isi.  The  notation  Q  +  n 
used  in  many  rules  defines  a  new  context  Q'  such  that  q'o  =  qo  in 
and  VZ  A  0.  =  qi.  In  all  the  rules,  we  have  the  implicit  side 

condition  that  all  rational  coefficients  are  non-negative.  Finally,  if 
a  rule  mentions  Q  and  Q'  and  leaves  the  latter  undefined  at  some 
index  i  we  assume  that  q'i  =  qt. 

Function  Specifications.  During  the  analysis,  function  specifica¬ 
tions  are  quadruples  (T/;  Q/,  T)^;  Q'f)  where  T/;  Q/  depend  on 
args,  and  T^;  Q'^  depend  on  ret.  These  parameters  are  instantiated 
by  appropriate  variables  on  call  sites.  A  distinctive  feature  of  our 
analysis  is  that  it  respects  the  function  abstraction:  when  deriving  a 
function  specification  it  generates  a  set  of  constraints  and  the  above 
quadruple;  once  done,  the  constraint  set  can  readily  be  reused  for 
every  call  site  and  the  function  need  not  be  analyzed  multiple  times. 
Therefore,  the  derivation  rules  are  parametric  in  a  function  context 
A  that  we  leave  implicit  in  the  rules  presented  here.  More  details 
can  be  found  in  the  extended  version. 

Derivation  Rules.  The  rules  of  our  derivation  system  must  serve 
two  purposes.  They  must  attach  potential  to  certain  program  vari¬ 
able  intervals  and  use  this  potential,  when  it  is  allowed,  to  pay  for 
resource  consuming  operations.  These  two  purposes  are  illustrated 
on  the  Q:Skip  rule.  This  rule  reuses  its  precondition  as  postcon¬ 
dition,  it  is  explained  by  two  facts:  First,  no  resource  is  consumed 
by  the  skip  operation,  thus  no  potential  has  to  be  used  to  pay  for 
the  evaluation.  Second,  the  program  state  is  not  changed  by  the 
execution  of  a  skip  statement.  Thus  all  potential  available  before  the 
execution  of  the  skip  statement  is  still  available  after. 

The  rules  Q:IncP,  Q:DecP,  and  Q:Inc  describe  how  the 
potential  is  distributed  after  a  size  change  of  a  variable.  The  rule 
Q:IncP  is  for  increments  at  <—  a; -I- y  and  Q:DecP  is  for  decrements 
x  *—  X  —  y.  They  both  apply  only  when  we  can  deduce  from  the 
logical  context  T  that  y  ^  0.  Of  course,  there  are  symmetrical  rules 
Q:IncN  and  Q:DecN  (not  presented  here)  that  can  be  applied  if 
y  is  negative.  The  rules  are  all  equivalent  in  the  case  where  y  =  0. 
The  rule  Q:Inc  can  be  applied  if  we  cannot  find  the  sign  of  y. 

To  explain  how  rules  for  increment  and  decrement  work,  it  is 
sufficient  to  understand  the  rule  Q:IncP.  The  others  follow  the  same 
idea  and  are  symmetrical.  In  Q:IncP,  the  program  updates  a  variable 
X  with  X  +  y  where  y  ^  Q.  Since  x  is  changed,  the  quantitative 
annotation  must  be  updated  to  reflect  the  change  of  the  program  state. 
We  write  x'  for  the  value  of  x  after  the  assignment.  Since  x  is  the 
only  variable  changed,  only  intervals  of  the  form  [tt,  x]  and  [a;,  u] 
will  be  resized.  Note  that  for  any  u,  [x,  m]  will  get  smaller  with  the 
update,  and  if  x'  e  [x,u]  we  have  |[a;,u]|  =  |[a:,a;']|  -I-  |[a;',M]|. 
But  I  [a:,  a;']  I  =  |[0,y]|  which  means  that  the  potential  q'^y  in 
the  postcondition  can  be  increased  by  q^u  under  the  guard  that 
x'  e  [a;,  u] .  Dually,  the  interval  [w,  x]  can  get  bigger  with  the  update. 
We  know  that  |  [u,  a;']  |  ^  y  -f  |  [w,  a:]  | .  So  we  decrease  the  potential 
of  [0,  y]  by  qvx  to  pay  for  this  change.  The  rule  ensures  this  only 
for  V  ^  hi  because  x  ^  v  otherwise,  and  thus  |  [w,  a;]  ]  =  0. 

The  rule  Q:L00P  is  a  cornerstone  of  our  analysis.  To  apply  it  on 
a  loop  body,  one  needs  to  find  an  invariant  potential  Q  that  will  pay 
for  the  iterations.  At  each  iteration,  M;  resources  are  spent  to  jump 
back.  This  explains  the  postcondition  Q+Mi.  Since  the  loop  can 
only  be  exited  with  a  break  statement,  the  postcondition  {T';  Q'}  for 
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B,  h  {r;  Q}  skip  {F;  Q} 


(Q:Skip) 


-B,  _R  [-  {F;  Q+Ma)  assert  e  {F Ae;  Q} 


(QiAssert) 


-  (Q:Tick) 


B.Bh  {F;Q+Mt(n)}tick(n){F;Q} 

P  =  Qfi[ret/x]  F  =  Ffl[rei/a;]  Vi  e  dom(P).pi  =  qi 
B,  (Ffl;  Qn)  h  {F;  Q}  return  a;  {F';  Q'} 

B,R\-  {F;  Q}  Si  {F';  Q'+Ms}  B,R\-  {F';  Q'}  S2  {F";  Q"} 
B,P|-  {F;Q}S'i;S2{F";Q"} 


(Q:Break) 


(F;  Qs),  B  h  {F;Qb+M6}  break  {F';Q'} 

(F';Q'):KI"  {r;Q}S{F;Q+Mj} 


(Q:Return) 


(Q:Seq) 


B,R\~  {F;Q}  loopS{F';Q'} 


(Q:L00P) 


T  \=  y  ^  0  U  =  {u\T' \=  X  +  y  e  \x,ii\} 

%y  =  lOy  +  YiueU  , 

B,R\-  {T[x/x+y]\  Q+Mu+Me{x+y)]  X  <—  a;  +  j/{F;  Q'} 


^  G  QJ 

^U.i^qyu  Qxu  “b  ^yu  ^  Quy  Qux  “b  Quy'}  , 

(Q:SET) 


B,B|-  {FAe;Q-Ml)}Si{F';(5'} 
B,B[-  {FA^e;Q-AF,"}52{F';Q'} 
B,R\-  {F;  Q+Me(e)}  if(e)  Si  else  S2  {F';  Q'} 

M  =  Mu  +  Me{x+y) 

Qoy  =  lOy  —  Ivx  q'yQ  =  qyO  —  <lxv 

B,R\-  {r[x/x±y]-,  Q+M}  x  ^  x  +  y  {P,  Q'} 


F|=p5=0  W  =  {ti|F|=a;  —  i/s[tt,a;]} 

lyO  ^  lyO  +  YiueU  —  YivfU 


B,RP  {T[x/y]-,  Q+Mu+M^{y)}  x^y{T-,  Q') 


B,R\-  {T[x/x—y]  \  Q+Mu+Me{x—y)}  X  <—  x  —  y  {F;  Q'} 


77^  (Q^DecP) 


{r  f,Q  f,r'j:-,Q'j,)  e  A{f)  Loc  =  Locals((3)  'ii^j.Xi^Xj  c  e  Qj]"  Q  =  P  +  S  Q' =  P' +  S  U  =  Qf[afgs/x] 

U'  =  Q'j:[ret/r]  'ii  e  dom{U) .  pi  =  Ui  Vi  e  dom(C/^).p^  =  'ii  ^  dom{U').p[  =  0  Vi  ^  Loc.  s;  =  0 

B,R\-  {rf[args/x]Ar-Lac',Q+c+Mf}r  <-  f{x)  {F'j.[rei/r]  aFlocI  Q'+c-Mr} 


(Q:Call) 


s/  =  (v,S/) 

B,  (F'^;  Q'f)  h  {rf[argsm-,Qf[args/y]}  Sf  {F';  Q'} 

- - - i - ; - ; -  (QiEXTEND) 

{rf,Qf,ry,Q'f)eA{f) 


B,Rb- {F2;Q2}S{Fij;Q^}  Fi  h  F2 

Qi  >ri  Q2  Fj  h  F'j^  Q2  >r'2  Qi 

B,Bh  {Fi;Qi}S{F'i;Q'i} 


(Q:Weak) 


C  =  {xy  I  aZi^aGN.F  |=  Ixy  ^  |[a;,p]|}  U  =  {xy  \  3n2,aeN .  F  |=  \[x,y}\  ^  Uxy} 

'iieU.q[^  qi-Ti  \lie  C.q[^  qi+Pi  'ii  ^  Uvj  Cvj{d} .  q'l  ^  qi  Qo  ^  qo+ JlisU  ~  T^isC^iPi 

Q'  >r  Q 


(Relax) 


Figure  4.  Inference  rules  of  the  quantitative  analysis. 


the  statement  loop  S  is  used  as  break  postcondition  in  the  derivation 
for  S. 

Another  interesting  rule  is  Q:Call.  It  needs  to  account  for  the 
changes  to  the  stack  caused  by  the  function  call,  the  arguments/re¬ 
turn  value  passing,  and  the  preservation  of  local  variables.  We  can 
sum  up  the  main  ideas  of  the  rule  as  follows. 

•  The  potential  in  the  pre-  and  postcondition  of  the  function 
specification  is  equalized  to  its  matching  potential  in  the  callee’s 
pre-  and  postcondition. 

•  The  potential  of  intervals  |  [x,  y]  \  is  preserved  across  a  function 
call  if  X  and  y  are  local. 

•  The  unknown  potentials  after  the  call  (e.g.  |  [x,  (?]  |,  with  x  local 
and  g  global)  are  set  to  zero  in  the  postcondition. 

If  X  and  y  are  local  variables  and  /(x,  y)  is  called,  Q:Call  splits 
the  potential  of  |[x,t/]|  in  two  parts.  One  part  to  perform  the 
computation  in  the  function  /  and  one  part  to  keep  for  later  use  after 
the  function  call.  This  splitting  is  realized  by  the  equations  Q  = 
P+S  and  Q'  =  P'+S' .  Arguments  in  the  function  precondition 
(Ty;  Q/)  are  named  using  a  fixed  vector  args  of  names  different 
from  all  program  variables.  This  prevents  name  conflicts  and  ensures 
that  the  substitution  [afgs/x\  is  meaningful.  Symmetrically,  we  use 
the  unique  name  ret  to  represent  the  return  value  in  the  function’s 
postcondition  {T'j-,  Q'f)- 

The  rule  Q:Weak  is  the  only  rule  that  is  not  syntax  directed.  We 
could  integrate  weakenings  into  every  syntax  directed  rule  but,  for 
the  sake  of  efficiency,  the  implementation  uses  a  simple  heuristic 
instead.  The  high-level  idea  of  Q:Weak  is  the  following:  If  we 


have  a  sound  judgement,  then  it  is  sound  to  add  more  potential  to 
the  precondition  and  remove  potential  from  the  postcondition.  The 
concept  of  more  potential  is  formalized  by  the  relation  Q'  >r  Q 
that  is  defined  in  the  rule  RELAX.  This  rule  also  deals  with  the 
important  task  of  transferring  constant  potential  (represented  by  go) 
to  interval  sizes  and  vice  versa.  If  we  can  deduce  from  the  logical 
context  that  the  interval  size  |  [x,  t/]  |  ^  £  is  larger  than  a  constant  I 
then  we  can  turn  the  potential  qxy-\  [®,  J/] |  form  the  interval  into  the 
constant  potential  i-qxy  and  guarantee  that  we  do  not  gain  potential. 
Conversely,  if  |[x,  t/]|  u  for  a  constant  u  then  we  can  transfer 
constant  potential  u-qxy  to  the  interval  potential  qxy-\  [x,  1/]  |  without 
gaining  potential. 

5.  Automatic  Inference  via  LP  Solving 

We  separate  the  search  of  a  derivation  in  two  steps.  As  a  first  step  we 
go  through  the  functions  of  the  program  and  apply  inductively  the 
derivation  rules  of  the  automatic  amortized  analysis.  This  is  done  in 
a  bottom-up  way  for  each  strongly  connected  component  (SCC)  of 
the  call  graph.  During  this  process  our  tool  uses  symbolic  names  for 
the  rational  coefficients  qi  in  the  rules.  Each  time  a  linear  constraint 
must  be  satisfied  by  these  coefficients,  it  is  recorded  in  a  global  list 
for  the  SCC  using  the  symbolic  names.  We  reuse  the  constraint  list 
for  every  call  from  outside  the  SCC. 

We  then  feed  the  collected  constraints  to  an  off-the-shelf  LP 
solver  (currently  CLP  [19]).  If  the  solver  successfully  finds  a 
solution,  we  know  that  a  derivation  exists  and  extract  the  values  for 
the  initial  Q  from  the  solver  to  get  a  resource  bound  for  the  program. 
To  get  a  full  derivation,  we  extract  the  complete  solution  from  the 
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(a:<10;S‘^^)  h  {a^^lO;  x  -  x  - 


(Q:DecP) 


(Q:Weak) 


(Q:TicK) 


(x<10;  5"'=)  h  {x^lO;  g"'=}  x  =  x  -  10  {■;  P"'=}  (x<10;  B“)  h  {■;  Q"}  tick(5)  {■;  P"} 

(a;<10;  B^‘i)  h  {x^lO;  x  =  x  -  10;  tick(5)  {■; 

(a;<10;  B'*')  h  {x^lO;  Q'^}  x  =  x  -  10;  tick(5)  {■;  P'^} 

(a;<10;  P'”')  h  {a;<10;  Q'”'}  break  {_L;  P'’"} 


(Q:Seq) 


(Q:Weak) 


(Q:Break) 


(a;<10;P'=')  h  {a;<10;  Q'*}  break  {■;  P‘=‘} 


(Q:WEAK) 


(a;<10;P‘°)  h  (x  >  10)  (x  =  x  -  10;  tick(5))  else  break  {sP'"} 


(Q:If) 


(Q:LOOP) 


(■;  P)  h-  {■;  Q)}  loop  if  (x  >  10)  (x  =  X  —  10;  tick(5))  else  break  {x<10;  P} 


Constraints: 

P=P'“  A  Q=Q‘°=P‘“  P'=>=P‘f=P‘<>  A  Q'='=Q‘‘'=Q‘<>  A  peLpif^plo 

P‘f=P^‘l  A  Q‘f  >(^<io)  Q*")  A  P*")  >(.)  P'f 

gti^pti  5  ^  Qwt  Qde  a  P‘>‘=  >(.)  P*" 

Linear  Objective  Function:  l-qx,o  +  10000-go,cc  +  ll‘^a;,io  +  9990-gio,cc 


P‘=l  =  P'’^  A  >(,<10)  A  pb'  >(.)  P-^l 

P"‘)=pwe=Bti  A  Q^‘l=Q'“<=  A  P*''==Q“  A  p'i=psi 
P^io=9o':io  +  A  Pi5'==gg'=  A  V(a,/3)  7^  (0, 
Constant  Objective  Function:  l-qo  +  Il'<i0,l0 


Figure  5.  An  example  derivation  as  produced  Cn.  The  constraints  are  resolved  by  an  off-the-shelf  LP  solver. 


solver  and  apply  it  to  the  symbolic  names  qi  of  the  coefficients  in 
the  derivation.  If  the  LP  solver  fails  to  find  a  solution,  an  error  is 
reported. 

Figure  5  contains  an  example  derivation  as  produced  by  C*B. 
The  upper  case  letters  (with  optional  superscript)  such  as  are 
families  of  variables  that  are  later  part  of  the  constraint  system 
that  is  passed  to  the  LP  solver.  For  example  stands  for  the 
potential  function  -I-  I  [*,  0]  |  -I-  qo%  |  [0,  a:]  |  -I-  |[a;,10]|-l- 

gio.a;l  [10)  ®]|  -F  gij'iiol  [0, 10]|,  where  the  variables  such  as  gl^'^io  are 
yet  unknown  and  later  instantiated  by  the  LP  solver. 

In  general,  the  weakening  rule  can  be  applied  after  every  syntax 
directed  rule.  However,  it  can  be  left  out  in  practice  at  some  places 
to  increase  the  efficiency  of  the  tool.  The  weakening  operation  >r  is 
defined  by  the  rule  RELAX.  It  is  parameterized  by  a  logical  context 
that  is  used  to  gather  information  on  interval  sizes.  For  example, 

Pde  ,,  7-)we  we  ^  de  , 

^(■)  B  =  Po,10  ^  P0,10  +  Wo, 10  —  Wo. 10 
A  Po"  ^  Po‘  -  lO'Mo.lO  +  10-Wo.lO 
A  V(a,/3)  i-  (0, 10).^”'^  ^P^,/3  • 

The  other  rules  are  syntax  directed  and  applied  inductively.  For 
example,  the  outermost  expression  is  a  loop,  so  we  use  the  rule 
Q:Loop  at  the  root  of  the  derivation  tree.  At  this  point,  we  do 
not  know  yet  whether  a  loop  invariant  exists.  But  we  produce  the 
constraints  Q’“  =  P*°.  These  constraints  express  the  fact  that  the 
potential  functions  before  and  after  the  loop  body  are  equal  and  thus 
constitute  an  invariant. 

After  the  constraint  generation,  the  LP  solver  is  provided  with 
an  objective  function  to  be  minimized.  We  wish  to  minimize  the 
initial  potential,  which  is  a  resource  bound  on  the  whole  program. 
Here  it  is  given  by  Q.  Moreover,  we  would  like  to  express  that 
minimization  of  linear  potential  such  as  gio,2)|  [10,  x]  \  takes  priority 
over  minimization  of  constant  potential  such  as  go  .io|[0,10]|. 

To  get  a  tight  bound,  we  use  modern  LP  solvers  that  allow 
constraint  solving  and  minimization  at  the  same  time:  First  we 
consider  our  initial  constraint  set  as  given  in  Figure  5  and  ask  the 
solver  to  find  a  solution  that  satisfies  the  constraints  and  minimizes 
the  linear  expression  l'g2,,o  -I-  lOOOO-go.D  -I-  H-qx,\o  +  9990'gio,a;. 
The  penalties  given  to  certain  factors  are  used  to  prioritize  certain 
intervals.  For  example,  a  bound  with  [10,  x]  will  be  preferred  to 
another  with  [0,  x]  because  |  [10,  x\\  ^  |  [0,  a;]  | .  The  LP  solver  now 
returns  a  solution  of  the  constraint  set  and  an  objective  value.  The 
solver  also  memorizes  the  optimization  path  that  led  to  the  optimal 


solution.  In  this  case,  the  objective  value  would  be  5000  since  the  LP 
solver  assigns  go, a,  =  0.5  and  g*  =  0  otherwise.  We  now  add  the 
constraint  l'ga:,o  -I-  lOOOO'go,^;  +  H-gD.io  +  9990'gio,a)  ^  5000 
to  our  constraint  set  and  ask  the  solver  to  optimize  the  objective 
function  go  -I-  11-go, lo-  This  happens  in  almost  no  time  in  practice. 
The  final  solution  is  go.i,  =  0.5  and  g^t  =  0  otherwise.  Thus  the 
derived  bound  is  0.5|  [0,  x]  | . 

A  notable  advantage  of  the  LP-based  approach  compared  to  SMT- 
solver-based  techniques  is  that  a  satisfying  assignment  is  a  proof 
certificate  instead  of  a  counter  example.  To  provide  high-assurance 
bounds,  this  certificate  can  be  checked  in  linear  time  by  a  simple 
validator. 

6.  Logical  State  and  User  Interaction 

While  complete  automation  is  desirable,  it  is  not  always  possible 
since  the  problem  of  bound  derivation  is  undecidable.  In  this  section 
we  present  a  new  technique  to  derive  complex  resource  bounds  semi- 
automatically  by  leveraging  our  automation.  Our  goal  is  to  develop 
an  interface  between  bound  derivation  and  established  qualitative 
verification  techniques. 

When  the  resource  bound  of  a  program  depends  on  the  contents 
of  the  heap,  or  is  non-linear  (e.g.  logarithmic,  exponential),  we  in¬ 
troduce  a  logical  state  using  auxiliary  variables.  Auxiliary  variables 
guide  C'^B  during  bound  derivation  but  they  do  not  change  the 
behavior  of  the  program. 

More  precisely,  the  technique  consists  of  the  following  steps. 
First,  a  program  P  that  fails  to  be  analyzed  automatically  is  enriched 
by  auxiliary  variables  x  and  assertions  to  form  a  program  Pi  (x) . 
Second,  an  initial  value  X  (a)  for  the  logical  variables  is  selected  to 
satisfy  the  proposition: 

'inaa' .  (a,  Pi{X{a)))  Jj„  a'  =►  3n'^n.  {a,  P)  [[„/  a'.  (*) 

Since  the  annotated  program  and  the  original  one  are  usually 
syntactically  close,  the  proof  of  this  result  goes  by  simple  induction 
on  the  resource-aware  evaluation  judgement.  Third,  using  existing 
automation  tools,  a  bound  B{x)  for  Pi{x)  is  derived.  Finally  this 
bound,  instantiated  with  X,  gives  the  final  resource  bound  for  the 
program  P. 

This  idea  is  illustrated  by  the  program  in  Figure  6.  The  parts 
of  the  code  in  blue  are  annotations  that  were  added  to  the  original 
program  text.  The  top-level  loop  increments  a  binary  counter  k 
times.  A  naive  analysis  of  the  algorithm  yields  the  quadratic  bound 
k  ■  N.  However,  the  algorithm  is  in  fact  linear  and  its  cost  is  bounded 
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1  logical  state  invariant  {na  =  #1(0)} 

2  while  (k  >  0)  { 

3  x=0; 

4  while  (x  <  N  &&  a[x]  ==  1)  { 

5  assert (na  >  0); 

6  a[x]=0;  na — ; 

7  tick(l) ;  X++;  } 

8  if  (x  <  N)  {  a[x]=l;  na++;  tick(l) ;  } 

9  k — ; 

10  } 


Figure  6.  Assisted  bound  derivation  using  logical  state.  We  write 
#i{a)  for  #{i  I  0^i<A^Aa[i]  =  l}  and  use  the  tick  metric.  The 
derived  hound  is  2 1 [0,  fc] I  +  |[0,na]|. 


1  logical  state  Invariant  {Ig  >  log2{h  —  1)} 

2  bsearch(x,l,h,lg)  { 

3  if  (h-1  >  1)  { 

4  assert (Ig  >  0); 

5  m  =  1  +  (h-l)/2; 

6  Ig — ;  if  (a[in]>x)  h=m;  else  l=ni; 

7  tick(Mbsearch)  ; 

8  1  =  bseeirch(x,l,h,lg)  ; 

9  tick(  Afbsearch )  J 

10  }  else  return  1; 

11  } 


Figure  7.  Assisted  bound  derivation  using  logical  state.  We  write 
log2(a;)  for  the  integer  part  of  logarithm  of  x  in  base  2.  The  semi- 
automatically  derived  bound  is  |  [0,  /gr]  | . 

by  2k  +  #i{a)  where  #i(a)  denotes  the  number  of  one  entries  in 
the  array  a.  Since  this  number  depends  on  the  heap  contents,  no 
tool  available  for  C  is  able  to  derive  the  linear  bound.  However, 
it  can  be  inferred  by  our  automated  tool  if  a  logical  variable  na 
is  introduced.  This  logical  variable  is  a  reification  of  the  number 
#1  (a)  in  the  program.  For  example,  on  line  6  of  the  example  we 
are  setting  a[x]  to  0  and  because  of  the  condition  we  know  that 
this  array  entry  was  1.  To  reflect  this  change  on  #i(a),  the  logical 
variable  na  is  decremented.  Similarly,  on  line  8,  an  array  entry 
which  was  0  becomes  1 ,  so  na  is  incremented.  To  complete  the  step 
2  of  the  systematic  procedure  described  above,  we  must  show  that 
the  extra  assertion  na  >  0  on  line  5  cannot  fail.  We  do  it  by  proving 
inductively  that  na  =  #i{a)  and  remarking  that  since  a  [x]  ==  1  is 
true,  we  must  have  ^i(a)>0,  thus  the  assertion  na  >  0  never  fails. 

Another  simple  example  is  given  in  Figure  7  where  a  logarithmic 
bound  on  the  stack  consumption  of  a  binary  search  program  is 
proved  using  logical  variable  annotations.  Once  again,  annotations 
are  in  blue  in  the  program  text.  In  this  example,  to  ease  the  proof 
of  equivalence  between  the  annotated  program  and  the  original  one, 
we  use  the  inequality  Ig  >  log2(fi  —  1)  as  invariant.  This  allows  a 
simpler  proof  because,  when  working  with  integer  arithmetic,  it  is 
not  always  the  case  that  log2(a:  —  x/2)  =  log2(a;)  —  1. 

Generally,  we  observed  that  because  the  instrumented  program 
is  structurally  same  as  the  original  one,  it  is  enough  to  prove  that  the 
added  assertions  never  fail  in  order  to  show  the  two  programs  satisfy 
the  proposition  (*).  This  can  usually  be  piggybacked  on  standard 
static-analysis  tools. 

7.  Soundness  Proof 

The  soundness  of  the  analysis  builds  on  a  new  cost  semantics  for 
Clight  and  an  extended  quantitative  logic.  Using  these  two  tools,  the 
soundness  of  the  automatic  analysis  described  in  Section  3  is  proved 
by  a  translation  morphism  to  the  logic. 


The  main  parts  of  the  soundness  proof  are  formalized  with  Coq 
and  available  for  download.  The  full  definitions  of  the  cost  semantics 
and  the  quantitative  Hoare  logic,  and  more  details  on  the  soundness 
proof  can  be  found  in  the  extended  version  of  this  article. 

Cost  Semantics  for  Clight.  To  base  the  soundness  proof  on  a 
formal  ground,  we  start  by  defining  a  new  cost-aware  operational 
semantics  for  Clight.  Clight’s  operational  semantics  is  based  on 
small-step  transitions  and  continuations.  Expressions — which  do 
not  have  side  effects — are  evaluated  in  a  big-step  fashion. 

A  program  state  a  =  {9, 7)  is  composed  of  two  maps  from 
variable  names  to  integers.  The  first  map,  6  :  Locals  — >  Z,  assigns 
integers  to  local  variables  of  a  function,  and  the  second  map, 
7  :  Globals  — >  Z,  gives  values  to  global  variables  of  the  program. 
In  this  article,  we  assume  that  all  values  are  integers  but  in  the 
implementation  we  support  all  data  types  of  Clight.  The  evaluation 
function  [[•]]  maps  an  expression  e  e  E  to  a.  value  JeJa  e  Z  in  the 
program  state  a.  We  write  cr{x)  to  obtain  the  value  of  x  in  program 
state  a.  Similarly,  we  write  a[x  >—>■  v]  for  the  state  based  on  a  where 
the  value  of  x  is  updated  to  v. 

The  small-step  semantics  is  standard,  except  that  it  tracks  the 
resource  consumption  of  a  program.  The  semantics  is  parametric 
in  the  resource  of  interest  for  the  user  of  our  system.  We  achieve 
this  independence  by  parameterizing  evaluations  with  a  resource 
metric  M;  a  tuple  of  rational  numbers  and  two  maps.  Each  of 
these  parameters  indicates  the  amount  of  resource  consumed  by 
a  corresponding  step  in  the  semantics.  Resources  can  be  released  by 
using  a  negative  cost.  Two  sample  rules  for  update  and  tick  follow. 

a'  =  a[x  ^  leja] 

- - - LJ - (u)  _ _ _ 

(cr,  a;  <— e,  if,  c)  — >  (a,  tick(n),  iT,  c)  — > 

(a',  skip,  K,  c—Mu—Me{e))  (a,  skip,  K,  c—Mt{n)) 

The  rules  have  as  implicit  side  condition  that  c  is  non-negative.  This 
makes  it  possible  to  detect  a  resource  crash  as  a  stuck  configuration 
where  c  <  0. 

Quantitative  Hoare  Logic.  To  prove  the  soundness  of  C*B  we 
found  it  useful  to  go  through  an  intermediate  step  using  a  quantitative 
Hoare  logic.  This  logic  is  at  the  same  time  a  convenient  semantic 
tool  and  a  clean  way  to  interface  manual  proofs  with  our  automation. 
We  base  it  on  a  logic  for  stack  usage  [15],  add  support  for  arbitrary 
resources,  and  simplify  the  handling  of  auxiliary  state. 

We  define  quantitative  Hoare  triples  as  B;  R  \-l  {Q}  S  {Q'} 
where  B,  R,  Q,  and  Q'  are  maps  from  program  states  to  an  element 
of  Qq  u  {00}  that  represents  an  amount  of  resources  available.  The 
assertions  B  and  R  are  postconditions  for  the  case  in  which  the  block 
S  exits  by  a  break  or  return  statement.  Additionally,  R  depends  on 
the  return  value  of  the  current  function.  The  meaning  of  the  triple 
{Q}  S  {Q'}  is  as  follows:  If  S  is  executed  with  starting  state  a,  the 
empty  continuation,  and  at  least  Q{(j)  resource  units  available  then 
the  evaluation  does  not  run  out  of  resources  and  there  are  at  least 
Q\cr')  resources  left  if  the  evaluation  terminates  in  a' .  The  logic 
rules  are  similar  to  the  ones  in  previous  work  and  generalized  to 
account  for  the  cost  introduced  by  our  cost-aware  semantics. 

Finally,  we  define  a  strong  compositional  continuation-based 
soundness  for  triples  and  prove  the  validity  of  all  the  rules  in  Coq. 
The  full  version  of  this  paper  [16],  provides  explanations  for  the 
rules  and  a  thorough  overview  of  our  soundness  proof. 

The  Soundness  Theorem.  We  use  the  quantitative  logic  as  the 
target  of  a  translation  function  for  the  automatic  derivation  system. 
This  reveals  two  orthogonal  aspects  of  the  proof:  on  one  side,  it 
relies  on  amortized  reasoning  (the  quantitative  logic  rules),  and  on 
the  other  side,  it  uses  combinatorial  properties  of  our  linear  potential 
functions  (the  automatic  analysis  rules). 

Technically,  we  define  a  translation  function  T  such  that  if  a 
judgement  J  in  the  automatic  analysis  is  derivable,  T(  J)  is  deriv- 
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t09 

tl9 

t30 

tl5 

tl3 

i=l;  j=0; 
while  (j<x)  { 

if  (i>=4) 

i=l,  tick(40) ; 
else  i++; 

while  (i>100)  { 
i — ;  tick(l) ; 

}  i  +=  k+50; 
while  (i>=0)  { 
i — ;  tick(l) ; 

} 

while  (x>0)  { 

X — ; 

t=x,  x=Y,  7=1:; 
tick(l); 

} 

assert (y>=0) ; 
while  (x  >  y)  { 

X  -=  y+1; 

for  (z=y;  z>0;  z — ) 
tick(l) ; 
tick(l) ; 

while  (x>0)  { 

X — ; 

if  (*)  y++; 
else 

while  (y>0) 
y — ,  tick(l) ; 

tick(l) ;  } 

} 

tick(l) ;  } 

ll|[0,x]| 

50-r|[-l,i]|  +  |[0,fc]| 

|[0,x]|  +  |[0,?/]| 

1  [0,^11 

2|[0,x]|-e|[0,?/]| 

Rank 

23-x  -  14 

54  -1-  fc  -1-  i 

— 

2  -1-  2x  - 

y 

0.5-^^+yx . . . 

LOOPUS 

41  max(a;,  0) 

max(*— 100, 0) 

-f  max(A:-l-i-l-51,  0) 

— 

— 

2  max(a:,  0) 

+  max(y,  0) 

Figure  8.  Comparison  of  resource  bounds  derived  by  different  tools  on  several  examples  with  linear  bounds. 


able  in  the  quantitative  logic.  By  using  T  to  translate  derivations  of 
the  automatic  analysis  to  derivations  in  the  quantitative  logic  we  can 
directly  obtain  a  certified  resource  bound  for  the  analyzed  program. 

The  translation  of  an  assertion  (F ;  Q)  in  the  automatic  analysis 
is  defined  by 

nr-Q)  :=  Aa.r(a)+<l>Q(a), 

where  we  write  $<3  for  the  unique  linear  potential  function  defined 
by  the  quantitative  annotation  Q.  The  logical  context  F  is  implicitly 
lifted  to  a  quantitative  assertion  by  mapping  a  state  a  to  0  if  F  (cr) 
holds  and  to  oo  otherwise.  These  definitions  let  us  translate  the 
judgement  J  :=  B ,  R  {P}  S  {P'}  by 

r(J)  ■.=  T{B)-T{R)  Pl  {T{P)}S{T{P')}. 

The  soundness  of  the  automatic  analysis  can  now  he  stated  formally 
with  the  following  theorem. 

Theorem  1  (Soundness  of  the  automatic  analysis).  If  J  is  a  judge¬ 
ment  derived  by  the  automatic  analysis,  then  T{J)  is  a  quantitative 
Hoare  triple  derivable  in  the  quantitative  logic. 

The  proof  of  this  theorem  is  constructive  and  maps  each  rule  of  the 
automatic  analysis  directly  to  its  counterpart  in  the  quantitative  logic. 
The  trickiest  parts  are  the  translations  of  the  rules  for  increments  and 
decrements  and  the  rule  Q:Weak  for  weakening  because  they  make 
essential  use  of  the  algebraic  properties  of  the  potential  functions. 

8.  Experimental  Evaluation 

We  have  experimentally  evaluated  the  practicality  of  our  automatic 
amortized  analysis  with  more  than  30  challenging  loop  and  recursion 
patterns  from  open-source  code  and  the  literature  [20-22].  A  full 
list  of  examples  is  given  in  the  extended  version  [16]. 

Figure  8  shows  five  representative  loop  patterns  from  the  evalu¬ 
ation.  Example  t09  is  a  loop  that  performs  an  expensive  operation 
every  4  steps.  is  the  only  tool  able  to  amortize  this  cost  over  the 
input  parameter  x.  Example  tl9  demonstrates  the  compositionality 
of  the  analysis.  The  program  consists  of  two  loops  that  decrement 
a  variable  i.  In  the  first  loop,  i  is  decremented  down  to  100  and 
in  the  second  loop  i  is  decremented  further  down  to  —1.  However, 
between  the  loops  we  assign  i  +=  k+50.  So  in  total  the  program 
performs  52  -I- 1  [— 1,  i]|  -I-  |  [0,  fc]|  ticks.  Our  analysis  finds  this  tight 
bound  because  our  amortized  analysis  naturally  takes  into  account 
the  relation  between  the  two  loops.  Example  t30  decrements  both 
input  variables  x  and  y  down  to  zero  in  an  unconventional  way.  In 
the  loop  body,  first  x  is  decremented  by  one,  then  the  values  of  the 
variables  x  and  y  are  switched  using  the  local  variable  f  as  a  buffer. 
Our  analysis  infers  the  tight  bound  |[0,a;]|  -I-  |[0,t/]|.  Sometimes 
we  need  some  assumptions  on  the  inputs  in  order  to  derive  a  bound. 
Example  tl5  is  such  a  case.  We  assume  here  that  the  input  variable  y 
is  non-negative  and  write  assert (y>=0).  The  assignment  X  -=  y+1 
in  the  loop  is  split  in  x —  and  x  -=  y.  If  we  enter  the  loop  then  we 


Table  1.  Comparison  of  C'^B  with  other  automatic  tools. 


KoAT 

Rank 

LOOPUS 

SPEED 

C^B 

#bounds 

9 

24 

20 

14 

32 

#lin.  bounds 

9 

21 

20 

14 

32 

#best  bounds 

0 

0 

11 

14 

29 

#tested 

14 

33 

33 

14 

33 

know  that  a;  >  0,  so  we  can  obtain  constant  potential  from  x — . 
Then  we  know  that  x  ^  y  ^  0,  as  a.  consequence  we  can  share  the 
potential  of  |  [0,  a;]  |  between  |  [0,  x]  \  and  |  [0,  y]  |  after  x  -=  y. 

Example  tl3  shows  how  amortization  can  be  used  to  find  linear 
bounds  for  nested  loops.  The  outer  loop  is  iterated  |  [0,  x]  \  times. 
In  the  conditional,  we  either  (the  branching  condition  is  arbitrary) 
increment  the  variable  y  or  we  execute  an  inner  loop  in  which  y 
is  counted  back  to  0.  C^B  computes  a  tight  bound.  The  extended 
version  also  contains  a  discussion  of  the  automatic  bound  derivation 
for  the  Knuth-Morris-Pratt  algorithm  for  string  search.  C'^B  finds 
the  tight  linear  bound  1  -I-  2|  [0,  n]  | . 

To  compare  our  tool  with  existing  work,  we  focused  on  loop 
bounds  and  use  a  simple  metric  that  counts  the  number  of  back  edges 
(i.e.,  number  of  loop  iterations)  that  are  followed  in  the  execution 
of  the  program  because  most  other  tools  only  bound  this  specific 
cost.  In  Eigure  8,  we  show  the  bounds  we  derived  {C'^B)  together 
with  the  bounds  derived  by  LOOPUS  [38]  and  Rank  [3].  We  also 
contacted  the  authors  of  SPEED  but  have  not  been  able  to  obtain 
this  tool.  KoAT  [13]  and  PUBS  [1]  currently  cannot  operate  on  C 
code  and  the  examples  would  need  to  be  manually  translated  into 
a  term-rewriting  system  to  be  analyzed  by  these  tools.  For  Rank  it 
is  not  completely  clear  how  the  computed  bound  relates  to  the  C 
program  since  the  computed  bound  is  for  transitions  in  an  automaton 
that  is  derived  from  the  C  code.  For  instance,  the  bound  2  y  —  x 
that  is  derived  for  t08  only  applies  to  the  first  loop  in  the  program. 

Table  1  summarizes  the  results  of  our  experiments  presented  in 
Appendix  A.  It  shows  for  each  tool  the  number  of  derived  bounds 
(#bounds),  the  number  of  asymptotically  tight  bounds  (#lin.  bounds), 
the  number  of  bounds  with  the  best  constant  factors  in  comparison 
with  the  other  tools  (#best  bounds),  and  the  number  of  examples 
that  we  were  able  to  test  with  the  tool  (#tested).  Since  we  were 
not  able  to  run  the  experiments  for  KoAT  and  SPEED,  we  simply 
used  the  bounds  that  have  been  reported  by  the  authors  of  the 
respective  tools.  The  results  show  that  our  automatic  amortized 
analysis  outperforms  the  existing  tools  on  our  example  programs. 
However,  this  experimental  evaluation  has  to  be  taken  with  a  grain 
of  salt.  Existing  tools  complement  C*B  since  they  can  derive 
polynomial  bounds  and  support  more  features  of  C.  We  were 
particularly  impressed  by  LOOPUS  which  is  very  robust,  works 
on  large  C  files,  and  derives  very  precise  bounds. 

Table  2  contains  a  compilation  of  the  results  of  our  experiments 
with  the  cBench  benchmark  suite.  It  shows  a  representative  list  of 
automatically  derived  function  bounds.  In  total  we  analyzed  more 
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Table  2.  Derived  bounds  for  functions  from  cBench. 


Function 

LoC 

Bound 

Time  (s) 

adpcm_coder 

145 

1  +  |[0,NJ| 

0.6 

adpcm.decod 

130 

1  +  |[0,N]| 

0.2 

BF_cfb64_enc 

151 

1  +  2|[-1,N]| 

0.7 

BF.cbc.enc 

180 

2  -1-  0.25|[-8,  N]| 

1.0 

mad.bit.crc 

145 

61.19-F0.19|[-1,  N]| 

0.4 

mad_bit_read 

65 

1  +  0.12|[0,  N]| 

0.05 

MDSUpdate 

200 

133.95-1-1.051  [0,  N]| 

1.0 

MDSFinal 

195 

141 

0.22 

sha_update 

98 

2  +  3.55|[0,  N]| 

1.2 

PackBitsDecode 

61 

1  -1-  65|[-129,cc]| 

0.6 

KMPSearch 

20 

l  +  2|[0,n]| 

0.1 

ycc.rgb.conv 

66 

nr  nc 

0.1 

uv_decode 

31 

log2(UV_NVS)  +  1 

0.1 

than  2900  lines  of  code.  In  the  LoC  column  we  not  only  count  the 
lines  of  the  analyzed  function  but  also  the  ones  of  all  the  function  it 
calls.  We  analyzed  the  functions  using  a  metric  that  assigns  a  cost  1 
to  all  the  back-edges  in  the  control  flow  (loops,  and  function  calls). 
The  bounds  for  the  functions  ycc_rgb_conv  and  uv_decode  have 
been  inferred  with  user  interaction  as  described  in  Section  6.  The 
most  challenging  functions  for  C'^B  have  unrolled  loops  where  many 
variables  are  assigned.  This  stresses  our  analysis  because  the  number 
of  LP  variables  has  a  quadratic  growth  in  program  variables.  Even 
on  these  stressful  examples,  the  analysis  could  finish  in  less  than 
2  seconds.  For  example,  the  sha_update  function  is  composed  of 
one  loop  calling  two  helper  functions  that  in  turn  have  6  and  1  inner 
loops.  In  the  analysis  of  the  SHA  algorithm,  the  compositionality 
of  our  analysis  is  essential  to  get  a  tight  bound  since  loops  on  the 
same  index  are  sequenced  4  and  2  times  without  resetting  it.  All 
other  tools  derive  much  larger  constant  factors. 

With  our  formal  cost  semantics,  we  can  run  our  examples  for 
different  inputs  and  measure  the  cost  to  compare  it  to  our  derived 
bound.  Figure  9  shows  such  a  comparison  for  Example  t08,  a  variant 
of  t08a  from  Section  3.  One  can  see  that  the  derived  constant  factors 
are  the  best  possible  if  the  input  variable  x  is  non-negative. 

9.  Limitations 

Our  implementation  does  not  currently  support  all  of  Clight.  Pro¬ 
grams  with  function  pointers,  goto  statements,  continue  statements, 
and  pointers  to  stack-allocated  variables  cannot  be  analyzed  automat¬ 
ically.  While  these  limitations  concern  the  current  implementation, 
our  technique  is  in  principle  capable  to  handle  them. 

For  the  sake  of  simplicity,  the  automated  system  described  here 
is  restricted  to  finding  only  linear  bounds.  However,  the  amortized 
analysis  technique  was  shown  to  work  with  polynomial  bounds  [25]; 
we  leave  this  extension  of  our  system  as  future  work. 

Even  certain  linear  programs  cannot  he  analyzed  automatically 
by  C'^B,  it  is  usually  the  case  for  programs  that  rely  on  heap 
invariants  (like  nul-terminated  C  strings),  for  programs  in  which 
resource  usage  depends  on  the  result  of  non-linear  operations  (like 
%  or  *)  in  a  non-trivial  way,  or  for  programs  whose  termination  can 
only  be  proved  by  complex  path-sensitive  reasoning. 

10.  Related  Work 

Our  work  has  been  inspired  by  type-based  amortized  resource 
analysis  for  functional  programs  [23,  26,  28].  Here,  we  present 
the  first  automatic  amortized  resource  analysis  for  C.  None  of  the 
existing  techniques  can  handle  the  example  programs  we  describe 
in  this  work.  The  automatic  analysis  of  realistic  C  programs  is 
enabled  by  two  major  improvements  over  previous  work.  First,  we 
extended  the  analysis  system  to  associate  potential  with  not  just 
individual  program  variables  but  also  multivariate  intervals  and, 
more  generally,  auxiliary  variables.  In  this  way,  we  solved  the  long- 


Figure  9.  The  automatically  derived  bound  1.33|  [a;,  j/]|  -I- 
0.33|  [0,  x]  I  (blue  lines)  and  the  measured  runtime  cost  (red  crosses) 
for  Example  t08.  For  x  ^  0  the  bound  is  tight. 


standing  open  problem  of  extending  automatic  amortized  resource 
analysis  to  compute  bounds  for  programs  that  loop  on  (possibly 
negative)  integers  without  decreasing  one  individual  number  in  each 
iteration.  Second,  for  the  first  time,  we  have  combined  an  automatic 
amortized  analysis  with  a  system  for  interactively  deriving  bounds. 
In  particular,  recent  systems  [24]  that  deal  with  integers  and  arrays 
cannot  derive  bounds  that  depend  on  values  in  mutable  locations, 
possibly  negative  integers,  or  on  differences  between  integers. 

A  recent  project  [15]  has  implemented  and  verified  a  quantitative 
logic  to  reason  about  stack-space  usage,  and  modified  the  verified 
CompCert  C  compiler  to  translate  C  level  bound  to  x86  stack  bounds. 
This  quantitative  logic  is  also  based  on  the  potential  method  but  has 
very  rudimentary  support  for  automation.  It  is  not  based  on  efficient 
LP  solving  and  cannot  automatically  derive  symbolic  bounds.  In 
contrast,  our  main  contribution  is  an  automatic  amortized  analysis 
for  C  that  can  derive  parametric  bounds  for  loops  and  recursive 
functions  fully  automatically.  We  use  a  more  general  quantitative 
Hoare  logic  that  is  parametric  over  the  resource  of  interest. 

There  exist  many  tools  that  can  automatically  derive  loop  and 
recursion  bounds  for  imperative  programs  such  as  SPEED  [20,  22], 
KoAT  [13],  PUBS  [1],  Rank  [3],  ABC  [10]  and  LOOPUS  [38,  40]. 
These  tools  are  based  on  abstract  interpretation-based  invariant 
generation  and/or  term  rewriting  techniques,  and  they  derive  impres¬ 
sive  results  on  realistic  software.  The  importance  of  amortization  to 
derive  tight  bounds  is  well  known  in  the  resource  analysis  commu¬ 
nity  [4,  30,  38].  Currently,  the  only  other  available  tools  that  can  be 
directly  applied  to  C  code  are  Rank  and  LOOPUS.  As  demonstrated, 
is  more  compositional  than  the  aforementioned  tools.  Our 
technique,  is  the  only  one  that  can  generate  resource  specifications 
for  functions,  deal  with  resources  like  memory  that  might  become 
available,  generate  proof  certificates  for  the  bounds,  and  support 
user  guidance  that  separates  qualitative  and  quantitative  reasoning. 

There  are  techniques  [12]  that  can  compute  the  memory  require¬ 
ments  of  object  oriented  programs  with  region-based  garbage  collec¬ 
tion.  These  systems  can  handle  loops  but  not  recursive  or  composed 
functions.  We  are  only  aware  of  two  verified  quantitative  analysis 
systems.  Albert  et  al.  [2]  rely  on  the  KeY  tool  to  automatically  verify 
previously  inferred  loop  invariants,  size  relations,  and  ranking  func¬ 
tions  for  Java  Card  programs.  However,  they  do  not  have  a  formal 
cost  semantics  and  do  not  prove  the  bounds  correct  with  respect  to  a 
cost  model.  Blazy  et  al.  [11]  have  verified  a  loop  bound  analysis  for 
CompCert’ s  RTL  intermediate  language.  However,  this  automatic 
bound  analysis  does  not  compute  symbolic  bounds. 

11.  Conclusion 

We  have  developed  a  novel  analysis  framework  for  compositional 
and  certified  worst-case  resource  bound  analysis  for  C  programs. 
The  framework  combines  ideas  from  existing  abstract  interpretation- 
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based  techniques  with  the  potentiai  method  of  amortized  anaiysis.  It 
is  impiemented  in  the  pubiiciy  avaiiabie  tooi  C'^B.  To  the  best  of  our 
knowiedge,  C'^B  is  the  first  tooi  for  C  programs  that  automaticaiiy 
reduces  the  derivation  of  symboiic  bounds  to  LP  soiving. 

We  have  demonstrated  that  our  approach  improves  the  state-of- 
the-art  in  resource  bound  anaiysis  for  C  programs  in  three  ways. 
First,  our  technique  is  naturaiiy  compositionai,  tracks  size  changes 
of  variabies,  and  can  abstractiy  specify  the  resource  cost  of  functions 
(Section  3).  Second,  it  is  easiiy  combinabie  with  estabiished  quaiita- 
tive  verification  to  guide  semi-automatic  bound  derivation  (Section 
6).  Third,  we  have  shown  that  the  iocai  inference  ruies  of  the  deriva¬ 
tion  system  automaticaiiy  produce  easiiy  checkabie  certificates  for 
the  derived  bounds  (Section  7).  Our  system  is  the  first  amortized 
resource  anaiysis  for  C  programs.  It  addresses  the  long-standing 
open  problem  of  extending  automatic  amortized  resource  analysis 
to  compute  bounds  for  programs  that  loop  on  signed  integers  and  to 
deal  with  non-linear  control  flow. 

This  work  is  the  starting  point  for  several  projects  that  we  plan 
to  investigate  in  the  future,  such  as  the  extension  to  concurrency, 
better  integration  of  low-Ievel  features  like  memory  caches,  and 
the  extension  of  the  automatic  analysis  to  multivariate  resource 
polynomials  [25]. 
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A.  Complete  Experimental  Results  for  the  Tool  Comparison 

Table  3.  Comparison  of  the  bounds  generated  by  KoAT,  Rank,  LOOPUS,  SPEED,  and  our  tool  C'^B  on  several  challenging  linear  examples. 
Results  for  KoAT  and  SPEED  were  extracted  from  previous  publications  [20-22,  38]  because  KoAT  cannot  take  C  programs  as  input  in  its 
current  version  and  SPEED  is  not  available.  Entries  marked  with  ?  indicate  that  we  cannot  test  the  respective  example  with  the  tool.  Entries 
marked  with  —  indicate  that  the  tool  failed  to  produce  a  result.  We  write  mx(a,  6)  for  the  maximum  of  a  and  b.  Eunctions  with  names  of  the 
form  tXX  are  challenging  tests  that  we  designed  during  the  development  of  C*B.  The  source  code  for  all  functions  is  available  in  the  extended 
version  [16]. 


Function 

KoAT 

Rank 

LOOPUS 

SPEED 

C^B 

gcd 

7 

(((2+1)... 

0{n) 

— 

7 

ILo,xj|+|Lo,9j| 

kmp 

7 

(((2+(n+  . . . 

0{n^) 

mx(n,  0)  . . . 

0{n) 

7 

l+2|[0,n]| 

qsort 

7 

— 

7 

l+2|[0,len]| 

speed  pldi09 
fig4  2 

— 

(((2+n) . . . 

0{n) 

— 

^  +  n 

m 

l+2|[0,n]| 

speed  pldi09 
fig4  4 

— 

(((2+(-l... 

0{n) 

— 

—  +  m 

m 

|[0,n]| 

speed  pldi09 
fig4  5 

28d  + 

79  +  27 

0(n) 

(((2+(-l... 

0{n) 

— 

mx(n,  n  —  m) 

— 

speed  pIdilO 
exl 

— 

— 

— 

n 

|[0,n]| 

speed  pIdilO 

ex3 

— 

(((2+(-l... 

0{n) 

2-mx(n,  0) 

0{n) 

n 

|[0,n]| 

speed  pIdilO 
ex4 

110a  + 

33 

0{n) 

— 

— 

n  +  1 

l+2|[0,n]| 

speed  popIlO 
fig2  1 

9a  + 

9b+  ... 

0{n) 

((2+((-9... 

0{n) 

mx(0,  n—x)  + 
mx(0,  m—y) 

0{n) 

mx(0,  n—x)  + 
mx(0,  m—y) 

\[x,n\\  +  \[y,m\\ 

speed  popIlO 
fig2  2 

6a+9b+ 

3c  +  5 

0(n) 

{{2-x... 

0{n) 

mx(05  {x  + 
1-2)... 

0{n) 

mx(0,  n—x)  + 
mx(0,  n—z) 

|[a;,n]|  +  |[2,n]| 

speed  popIlO 
nested  multiple 

— 

{{2—x-\-n . . . 

0{n'^) 

mx(0,  m—y)  -\~ 
mx(0,  n—x) 

0{n) 

mx(0,  n—x)  + 
mx(0,  m—y) 

|[x,n]|  +  |[9,m]| 

speed  popIlO 
nested  single 

486  +  16 

0(n) 

{{{l—x+n . . . 

0{n) 

mx(0,n— 1)  . . . 

0{n) 

n 

|[0,n]| 

speed  popIlO 
sequential  single 

216  +  6 

0(n) 

((2  —  x-\-n  . . . 

0{n) 

2.mx(n,  0) 

0{n) 

n 

|[0,n]| 

speed  popIlO 
simple  multiple 

9c  + 
lOd+7 

0(n) 

{{2-y+m... 

0{n) 

mx(n,  0)  + 
mx(m,  0) 

0{n) 

n  +  m 

|[0,m]|  +  |[0,n]| 

speed  popIlO 
simple  single2 

20d  + 

12c  +  17 

0(n) 

— 

mx(n5  0)  + 
mx(m,  0) 

0{n) 

n  +  m 

|[0,n]|  +  |[0,m]| 

speed  popIlO 
simple  single 

46  +  6 

0(n) 

{{2—x-\-n . . . 

0{n) 

mx(n,  0) 

0{n) 

n 

|[0,n]| 

t07 

7 

2  X 

0{n) 

mx((r,  0)  . . . 

0{n) 

7 

l+3|[0,a;J|  +  |[0,9j| 

t08 

7 

((2+2-9... 

0{n) 

mx(0,9— 2)  . . . 

0{n) 

7 

1.33|[9,^]|+0.33|[0,9]| 

tlO 

7 

((2-y+x... 

0{n) 

mx(0,  x—y) 

0{n) 

7 

Wy^^W 

til 

7 

{{2-y+m... 

0{n) 

mx(0,  n—x))  + 
mx(0,  m—y) 

0{n) 

7 

\[x,n\\  +  \[y,m\\ 

tl3 

7 

(((1+92/2... 

0{n‘^) 

2-mx(a:,  0)  + 
mx(y,  0) 

0{n) 

7 

2|[0,x]|  +  |[0,9]| 

tl5 

7 

((1+a;... 

0{n) 

— 

7 

|[0,xj| 

tl6 

7 

((-99.9... 

0{n) 

— 

7 

101|[0,x]| 

tl9 

7 

((153+A:. . . 

0{n) 

mx(0,2~10^)  + 
mx(0,/c+2+51) 

0{n) 

7 

50+|[-l,i]|  +  |[0,fc]| 

t20 

7 

(2-9+a;... 

0{n) 

2-mx(0,y— x)  + 
mx(0,x— y) 

0{n) 

7 

\[^,y^\+\[y,^^\ 

t27 

7 

— 

10^mx(0, 

-n)... 

0{n) 

7 

0.01|[n,9]|  +  ll|[n,0]| 

t28 

7 

((1-9+a;... 

0{n) 

10^  mx(0,  X  — 

y)--- 

0{n) 

7 

|[^,0]|  +  |[0,9]| 

+  10021  [9,  x]| 

tSO 

7 

— 

— 

7 

|[0,a:J|  +  |[0,9j| 

t37 

7 

— 

— 

7 

3+2|[0,a;J|  +  |[0,9j| 

t39 

7 

— 

— 

7 

1.33+0.671  [2,  9]  1 

t46 

7 

— 

— 

7 

|[0,J/J| 

t47 

7 

4  +  n 

0{n) 

1  +  mx(n,  0) 

0{n) 

7 

l+|[0>nj| 

498 


